Slow injestion/indexing performance, possibility to index in batch? #128

vever001 · 2024-11-08T09:46:11Z

Hello,

We are currently considering adding oe_search on joinup, but noticed that the indexing is very slow.
It looks like oe_search performs 2 POST requests to the API (1 for the token and 1 to send the payload) for every document, see \Drupal\oe_search\Plugin\search_api\backend\SearchApiEuropaSearchBackend::indexItems
This results in many requests and a slow indexing overall.

Here are xhprof results when indexing 500 items via drush, which took 2m36s to complete on my local.

We can see 1000 requests made to the API via curl_exec, taking 87% of the process execution time.
Each document ingestion triggers 2 requests, each taking around 100-200ms (x 1000 requests, which matches with the curl_exec footprint):

'POST', 'https://***/token',
'POST', 'https://***/ingestion-api/acc/rest/ingestion/text?...

The total items/nodes to index on the project is currently 25K, which would take more than 2 hours and 50K requests to the europa search API.

To avoid so many requests, would it be possible to:

Request the token only once
While the token likely expires after some time, we could implement a storage mechanism with an expiration logic (e.g., using expirable keyvalue?).
Index multiple items in a single HTTP POST request to the Europa search API.
This would depend on the Europa search API's capabilities. I reached out to them, and they mentioned support for bulk indexing via /rest/ingestion/bulk. However, this endpoint seems specifically designed for documents (e.g., PDFs, Word files), so I'm not sure it supports our use case at this time.
For context, search_api processes items in batches of 50 by default. This approach would make it similar to search_api_solr for example, which sends all 50 items in a single request.

Thank you

The text was updated successfully, but these errors were encountered:

allternativ9 · 2024-12-18T15:43:26Z

Dear,
Thank you for sharing your technical suggestion.

To ensure better communication and support for our Europa Web Platform developers, we have established a Community of Practice (CoP @ EC and EUIBAs - European Commission and GRP-Drupal CoP @ EC and EUIBAs | General | Microsoft Teams) and several training sessions and fora to support the members of the community as follows:

Dedicated "ASK OEL" Channel and meetings GRP-Drupal CoP @ EC and EUIBAs | 04. 💭 Ask OEL Team!! | Microsoft Teams: These sessions are designed for developers to directly raise questions, discuss challenges, and receive guidance from the team. We encourage you to take advantage of these meetings to address any blockers or technical issues. They are organised every two Thursdays and your team members can join to be guided: bi-monthly sessions on Teams

You can also post your message in the Ask OEL Teams Channel meanwhile.

Thanks a lot for your understanding.
Angela Grigore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow injestion/indexing performance, possibility to index in batch? #128

Slow injestion/indexing performance, possibility to index in batch? #128

vever001 commented Nov 8, 2024 •

edited

Loading

allternativ9 commented Dec 18, 2024

Slow injestion/indexing performance, possibility to index in batch? #128

Slow injestion/indexing performance, possibility to index in batch? #128

Comments

vever001 commented Nov 8, 2024 • edited Loading

allternativ9 commented Dec 18, 2024

vever001 commented Nov 8, 2024 •

edited

Loading