活水快报 - 42Digest

LatticeHashForest:用于重复数据和运营的高效数据结构

LatticeHashForest: An Efficient Data Structure for Repetitive Data and Operations

Anamitra Ghorui and Uday P. Khedker

arXiv

2025年10月21日

将整个程序作为单个单元或整个程序分析,涉及通过程序的控制流传播大量信息。对于指针分析尤其如此,除非在分析的精度方面做出重大妥协,否则信息的组合会爆发。我们在自己的工作中观察到的一个关键问题是,许多重复的数据正在被传播,许多低级数据结构操作被重复了很多次。我们介绍了我们认为新颖和通用的数据结构LatticeHashForest(LHF),以删除大多数冗余计算和重复数据的方式存储和操作此类信息,这些场景类似于编译器和程序优化中遇到的情景。 LHF不同于这方面的类似工作,如哈希约束,ZDD和BDD,不仅提供了一种在大型聚合结构上有效操作的方法,而且还以可以立即重复的方式修改这些结构的元素。 LHF还提供了一种方法来执行元素的嵌套构造,以便可以在多个级别上重复,从而减少对额外嵌套计算的需求。我们提供详细的结构描述,以及这个数据结构的抽象模型。 LHF的整个C++实现是作为神器提供的,同时使用示例和基准程序评估LHF。我们还提供 API 文档和用户手册,供用户制作 LHF 的独立应用程序。我们在指针分析领域的主要用例显示,内存使用量减少到几乎可以忽略不计的分数,与其他实现相比,输入尺寸的加速超过4倍,接近1000万。

Analysis of entire programs as a single unit, or whole-program analysis, involves propagation of large amounts of information through the control flow of the program. This is especially true for pointer analysis, where, unless significant compromises are made in the precision of the analysis, there is a combinatorial blowup of information. One of the key problems we observed in our own efforts to this end is that a lot of duplicate data was being propagated, and many low-level data structure ope...

View Source