Abstract
Many data-intensive business and research applications rely on advanced database indexing techniques to ensure efficiency. A popular approach for read-only data sets is bitmap indices. They use fast machine bitwise operations when querying and are highly compressible. A commonly used compression technique used for bitmaps is Word-Aligned Hybrid (WAH). WAH is a run-length encoding scheme that greatly compresses sparse bitmaps and can directly query data without explicit decompression. In this paper, we present an algorithm that uses metadata, gathered at compression-time, to enable logical short-circuiting in WAH's query algorithm, thus reducing the number of required memory accesses. We also present a heuristic that identifies queries where the processing overhead of our algorithm will not likely be off-set by the number of eliminated memory accesses. In these cases, it is advantageous to process the query using the standard WAH algorithm. The results of our empirical study over both real and synthetic data sets show that our approach can realize average query speedups of 1.3× to +84× when compared to WAH. They also showed that the use of our heuristic reduced the number of times our approach underperformed WAH.