Skip to content
/ wiser Public

A fast text search engine built for SSDs, written in C++.

Notifications You must be signed in to change notification settings

junhe/wiser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

We describe WiSER, a clean-slate search engine designed to exploit high-performance SSDs with the philosophy "read as needed". WiSER utilizes many techniques to deliver high throughput and low latency with a relatively small amount of main memory; the techniques include an optimized data layout, a novel two-way cost-aware Bloom filter, adaptive prefetching, and space-time trade-offs. In a system with memory that is significantly smaller than the working set, these techniques increase storage space usage (up to 50%), but reduce read amplification by up to 3x, increase query throughput by up to 2.7x, and reduce latency by 16x when compared to the state-of-the-art Elasticsearch. We believe that the philosophy of "read as needed" can be applied to more applications as the read performance of storage devices keeps improving.

WiSER was called Vacuum. Because of this, you will see the name 'vacuum' a lot in this repo.

The paper about WiSEr was published at FAST'20. The title is "Read as Needed: Building WiSER, a Flash-Optimized Search Engine". http://pages.cs.wisc.edu/~jhe/fast20-wiser.pdf

(Feb 19, 2020: I (Jun) am not a grad student anymore. It would be great if someone could help us to improve this repository. Otherwise, I'll try to find some spare time...)

(Jan 12, 2020: Update: We will improve this repos to make it easy to run. Vaccum is well tuned and runs pretty fast.)

Directory structure

The main C++ code of Vaccuum is in src/qq_mem/. We also have lots of experimental code in the repository, at least for now.

  • data/ Data for benchmarking and some scripts to manipulate the data.
  • scripts/ A bunch of Python and Shell scripts for our experiments and setup.
  • src/
    • lucene/ a copy of lucene code. We played with it.
    • pysrc/ some Python code
      • benchmarks/ scripts for benchmarking redisearch, elasticsearch, ...
      • in_mem/ we developed a minimal python in-memory engine here.
    • qq_mem/ this is the main direcotry for Vaccuum. We have the name "qq_mem" because things evolve and we are too lazy to change directory names.
      • src/ Vacuum source code (C++)
      • tools/ A bunch of helper scripts
      • `README.md Instruction on how to run Vacuum
    • tutorials/ this has some Lucene examples that we played with.

Benchmark

We evaluate search engines by synthetic and real queries. The synthetic queries can be generated by src/qq_mem/tools/gen_synthetic_log.py. real queries can be find at http://www.wikibench.eu/.

Basically, what we did was sample terms in Wikipedia by their frequencies.

Build and run Vacuum

Please see src/qq_mem/.

Please contact Jun He ([email protected]) and Kan Wu ([email protected]) if you have any questions.

About

A fast text search engine built for SSDs, written in C++.

Resources

Stars

Watchers

Forks

Packages

No packages published