Skip to content

Commit

Permalink
Add v2 of the API (#4)
Browse files Browse the repository at this point in the history
  • Loading branch information
gitbuda authored Jan 1, 2024
1 parent 9507221 commit b2e8e4f
Show file tree
Hide file tree
Showing 6 changed files with 427 additions and 336 deletions.
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
cmake_minimum_required(VERSION 3.15)
project(mgcxx VERSION 0.0.3)
set (CMAKE_CXX_STANDARD 20)
if (NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE "Debug")
endif()
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
include(ExternalProject)
include(FetchContent)
project(cxxtantivy)
option(ENABLE_TESTS "Enable tests" ON)

# NOTE: Be careful with moving this outside of the if block (it should not be
Expand Down
25 changes: 19 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
# cxxtantivy
# mgcxx (experimental)

## Work in Progress
A collection of C++ wrappers around Rust libraries.
The list includes:
* full-text search enabled by [tantivy](https://github.com/quickwit-oss/tantivy)

## text_search

### TODOs

- [ ] Figure out the right API ⏳
- [ ] All READ methods (`search`, `aggregate`, `find`) depend on the exact schema -> make it robust
- [ ] Implement full API
- [ ] delete
- [ ] update
- [ ] Polish & test all error messages
- [ ] Write unit / integration test to compare STRING vs JSON fiels search query syntax.
- [ ] Figure out what's the right search syntax for a property graph
- [ ] Add some notion of pagination
Expand All @@ -29,14 +35,21 @@
- [ ] Note [DocAddress](https://docs.rs/tantivy/latest/tantivy/struct.DocAddress.html) is composed of 2 u32 but the `SegmentOrdinal` is tied to the `Searcher` -> is it possible/wise to cache the address (`SegmentId` is UUID)
- [ ] A [searcher](https://docs.rs/tantivy/latest/tantivy/struct.IndexReader.html#method.searcher) per transaction -> cache `DocAddress` inside Memgraph's `ElementAccessors`?
- [ ] Implement the stress test by adding & searching to the same index concurrently + large dataset generator.
- [ ] Consider implementing panic! handler preventing outside process to crash (optionally).

### NOTEs

* if a field doesn't get specified in the schema, it's ignored
* `TEXT` means the field will be tokenized and indexed (required to be able to search)
* Tantivy add_json_object accepts serde_json::map::Map<String, serde_json::value::Value>.
* `TEXT` means the field will be tokenized and indexed (required to be able to
search)
* Tantivy add_json_object accepts serde_json::map::Map<String, serde_json::value::Value>
* C++ text-search API is snake case because it's implemented in Rust
* Writing each document and then committing (writing to disk) will be
expensive. In a standard OLTP workload that's a common case -> introduce some
form of batching.

## Resources

* https://fulmicoton.com/posts/behold-tantivy-part2
* https://stackoverflow.com/questions/37924383/combining-several-static-libraries-into-one-using-cmake
--> decided to have 2 separate libraries user code has to link
14 changes: 13 additions & 1 deletion text_search/ci.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,23 +29,35 @@ echo "Run config:"
echo " full : $MGCXX_TEXT_SEARCH_CI_FULL"
echo " release: $MGCXX_TEXT_SEARCH_CI_RELEASE"

cd "$SCRIPT_DIR/.."
FILES_TO_FIX=$({ git diff --name-only ; git diff --name-only --staged ; } | sort | uniq | egrep "\.c$|\.cpp$|.cxx$|\.h$|\.hpp$|\.hxx$" || true)
if [ ! -z "$FILES_TO_FIX" ]; then
for file in "${FILES_TO_FIX}"; do
clang-format -i -verbose ${file}
done
fi
cd "$SCRIPT_DIR"
# TODO(gitbuda): Add clang-format call here.
cargo fmt

mkdir -p "$SCRIPT_DIR/../build"
cd "$SCRIPT_DIR/../build"
if [ "$MGCXX_TEXT_SEARCH_CI_FULL" = true ]; then
rm -rf ./* && rm -rf .cache
# Added here because Cargo.lock is ignored for libraries, but it's not
# located under build folder. Rebuilding from scratch should also start clean
# from cargo perspective.
rm "$SCRIPT_DIR/Cargo.lock" || true
else
rm -rf index*
fi

if [ "$MGCXX_TEXT_SEARCH_CI_RELEASE" = true ]; then
cmake -DCMAKE_BUILD_TYPE=Release ..
else
cmake ..
fi
make -j8

cd "$SCRIPT_DIR/../build/text_search"
./test_unit
./test_bench
Expand Down
Loading

0 comments on commit b2e8e4f

Please sign in to comment.