A refined version is here: https://github.com/U-Ar/full-br-index
New features:
- Space-efficient index construction using Prefix-Free Parsing.
- Pattern contraction operations (left-contraction, right-contraction)
- Integration tests
This repository provides the bi-directional r-index (br-index).
The r-index is the compressed text index which supports efficient count(P) and locate(P). Its uses O(r) words of space, whose r is the number of equal-letter runs in BWT of the text.
The br-index we have proposed is achieved by using the mechanism of the r-index but adding new structures. It allows for bi-directional extension of patterns, left-extension and right-extension. Also, you can locate all the occurrences of the current pattern at any step of the search.
-
This project is based on sdsl-lite library. Install sdsl-lite beforehand and modify variables SDSL_INCLUDE and SDSL_LIB in CMakeLists.txt.
-
This project has been tested under gcc 4.8.5 and gcc 7.5.0.
Firstly, clone the repository. Since a submodule is used (iutest), recursive cloning is necessary.
git clone --recursive https://github.com/U-Ar/br-index.git
In order to build, execute following commands: (This project is using CMake)
mkdir build
cd build
cmake ..
make
6 executables will be created in the build directory.
- bri-build
- Builds the br-index on the input text file.
- bri-locate
- Locates the occurrences of the given pattern using the index. Provide a pattern file in the Pizza&Chili format. You can give an option "-m (number)" for the number of mismatched characters allowed (0 by default).
- bri-count
- Counts the number of the occurrences of the given pattern using the index. Its usage is same as bri-locate.
- bri-seedex
- Applies the seed-and-extend approach to the given pattern. Exactly matches the core region and extends with some mismatches.
- bri-space
- Shows the statistics of the text and the breakdown of the index space usage.
- run_tests
- runs unit tests.
Also, you can run unit tests by
make test-bri
- br_index.hpp (default)
- The simple implementation of br-index used in the experiments. Only the variables necessary for locate (j,d,len) are maintained, which are sufficient to compute locate.
- br_index_nplcp.hpp
- The implementation without PLCP. (p,j,d,len) are maintained. It computes locate by calculating p'=LF^d(p) and comparing p' with [s, e]. Larger than the normal one while computing locate is faster when occ is large compared to |P|.
- br_index_naive.hpp (old)
- The naive implementation of br-index. All the variables p,j,d,pR,jR,dR,len are maintained during the search. Not space-efficient, implemented mainly for the educational purpose and the possible future use. (It's not updated now, so it doesn't function)
Cite the following paper:
- Arakawa, Y., Navarro, G., & Sadakane, K. (2022). Bi-Directional r-Indexes. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.
It is more desirable to cite the following papers in addition, which are the original papers of the r-index:
- Gagie, T., Navarro, G., & Prezza, N. (2018). Optimal-time text indexing in BWT-runs bounded space. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (pp. 1459-1477). Society for Industrial and Applied Mathematics.
- Gagie, T., Navarro, G., & Prezza, N. (2020). Fully functional suffix trees and optimal text searching in BWT-runs bounded space. Journal of the ACM (JACM), 67(1), 1-54.