Skip to content
/ br-index Public

Compressed text index supporting flexible search, such as count, locate with some mismatched characters

Notifications You must be signed in to change notification settings

U-Ar/br-index

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

br-index

News

A refined version is here: https://github.com/U-Ar/full-br-index

New features:

  • Space-efficient index construction using Prefix-Free Parsing.
  • Pattern contraction operations (left-contraction, right-contraction)
  • Integration tests

About

This repository provides the bi-directional r-index (br-index).

The r-index is the compressed text index which supports efficient count(P) and locate(P). Its uses O(r) words of space, whose r is the number of equal-letter runs in BWT of the text.

The br-index we have proposed is achieved by using the mechanism of the r-index but adding new structures. It allows for bi-directional extension of patterns, left-extension and right-extension. Also, you can locate all the occurrences of the current pattern at any step of the search.

System Requirements

  • This project is based on sdsl-lite library. Install sdsl-lite beforehand and modify variables SDSL_INCLUDE and SDSL_LIB in CMakeLists.txt.

  • This project has been tested under gcc 4.8.5 and gcc 7.5.0.

How to Use

Firstly, clone the repository. Since a submodule is used (iutest), recursive cloning is necessary.

git clone --recursive https://github.com/U-Ar/br-index.git

In order to build, execute following commands: (This project is using CMake)

mkdir build
cd build
cmake ..
make

6 executables will be created in the build directory.

bri-build
Builds the br-index on the input text file.
bri-locate
Locates the occurrences of the given pattern using the index. Provide a pattern file in the Pizza&Chili format. You can give an option "-m (number)" for the number of mismatched characters allowed (0 by default).
bri-count
Counts the number of the occurrences of the given pattern using the index. Its usage is same as bri-locate.
bri-seedex
Applies the seed-and-extend approach to the given pattern. Exactly matches the core region and extends with some mismatches.
bri-space
Shows the statistics of the text and the breakdown of the index space usage.
run_tests
runs unit tests.

Also, you can run unit tests by

make test-bri

Versions

br_index.hpp (default)
The simple implementation of br-index used in the experiments. Only the variables necessary for locate (j,d,len) are maintained, which are sufficient to compute locate.
br_index_nplcp.hpp
The implementation without PLCP. (p,j,d,len) are maintained. It computes locate by calculating p'=LF^d(p) and comparing p' with [s, e]. Larger than the normal one while computing locate is faster when occ is large compared to |P|.
br_index_naive.hpp (old)
The naive implementation of br-index. All the variables p,j,d,pR,jR,dR,len are maintained during the search. Not space-efficient, implemented mainly for the educational purpose and the possible future use. (It's not updated now, so it doesn't function)

Citation

Cite the following paper:

  • Arakawa, Y., Navarro, G., & Sadakane, K. (2022). Bi-Directional r-Indexes. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.

It is more desirable to cite the following papers in addition, which are the original papers of the r-index:

  • Gagie, T., Navarro, G., & Prezza, N. (2018). Optimal-time text indexing in BWT-runs bounded space. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (pp. 1459-1477). Society for Industrial and Applied Mathematics.
  • Gagie, T., Navarro, G., & Prezza, N. (2020). Fully functional suffix trees and optimal text searching in BWT-runs bounded space. Journal of the ACM (JACM), 67(1), 1-54.

About

Compressed text index supporting flexible search, such as count, locate with some mismatched characters

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published