Test performance impact of `addDocuments` vs `addDocument` #252

mikemccand · 2023-12-14T12:26:18Z

[Spinoff from https://github.com/apache/lucene/pull/12829#issuecomment-1855755782]

I'm curious what overhead we pay calling addDocument for N documents, versus indexing all N docs in a single addDocuments call. IW has non-trivial entry / exit costs (checking out / locking the DWPT, checking flush triggers, locking to free the DWPT, etc.).

One simple way to test this would be to modify our existing Indexer.java when reading from a binary line file docs to index each block with a single addDocuments call.

The text was updated successfully, but these errors were encountered:

jpountz · 2023-12-14T13:26:41Z

We could use IndexGeonames, which has a batchAddDocuments boolean value aimed at checking exactly this.

mikemccand mentioned this issue Dec 14, 2023

Add support for index sorting with document blocks apache/lucene#12829

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test performance impact of `addDocuments` vs `addDocument` #252

Test performance impact of `addDocuments` vs `addDocument` #252

mikemccand commented Dec 14, 2023

jpountz commented Dec 14, 2023

Test performance impact of addDocuments vs addDocument #252

Test performance impact of addDocuments vs addDocument #252

Comments

mikemccand commented Dec 14, 2023

jpountz commented Dec 14, 2023

Test performance impact of `addDocuments` vs `addDocument` #252

Test performance impact of `addDocuments` vs `addDocument` #252