You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are currently considering adding oe_search on joinup, but noticed that the indexing is very slow.
It looks like oe_search performs 2 POST requests to the API (1 for the token and 1 to send the payload) for every document, see \Drupal\oe_search\Plugin\search_api\backend\SearchApiEuropaSearchBackend::indexItems
This results in many requests and a slow indexing overall.
Here are xhprof results when indexing 500 items via drush, which took 2m36s to complete on my local.
We can see 1000 requests made to the API via curl_exec, taking 87% of the process execution time.
Each document ingestion triggers 2 requests, each taking around 100-200ms (x 1000 requests, which matches with the curl_exec footprint):
The total items/nodes to index on the project is currently 25K, which would take more than 2 hours and 50K requests to the europa search API.
To avoid so many requests, would it be possible to:
Request the token only once
While the token likely expires after some time, we could implement a storage mechanism with an expiration logic (e.g., using expirable keyvalue?).
Index multiple items in a single HTTP POST request to the Europa search API.
This would depend on the Europa search API's capabilities. I reached out to them, and they mentioned support for bulk indexing via /rest/ingestion/bulk. However, this endpoint seems specifically designed for documents (e.g., PDFs, Word files), so I'm not sure it supports our use case at this time.
For context, search_api processes items in batches of 50 by default. This approach would make it similar to search_api_solr for example, which sends all 50 items in a single request.
Thank you
The text was updated successfully, but these errors were encountered:
Dedicated "ASK OEL" Channel and meetings GRP-Drupal CoP @ EC and EUIBAs | 04. 💭 Ask OEL Team!! | Microsoft Teams: These sessions are designed for developers to directly raise questions, discuss challenges, and receive guidance from the team. We encourage you to take advantage of these meetings to address any blockers or technical issues. They are organised every two Thursdays and your team members can join to be guided: bi-monthly sessions on Teams
You can also post your message in the Ask OEL Teams Channel meanwhile.
Thanks a lot for your understanding.
Angela Grigore.
Hello,
We are currently considering adding oe_search on joinup, but noticed that the indexing is very slow.
It looks like oe_search performs 2 POST requests to the API (1 for the token and 1 to send the payload) for every document, see \Drupal\oe_search\Plugin\search_api\backend\SearchApiEuropaSearchBackend::indexItems
This results in many requests and a slow indexing overall.
Here are xhprof results when indexing 500 items via drush, which took 2m36s to complete on my local.
We can see 1000 requests made to the API via curl_exec, taking 87% of the process execution time.
Each document ingestion triggers 2 requests, each taking around 100-200ms (x 1000 requests, which matches with the curl_exec footprint):
The total items/nodes to index on the project is currently 25K, which would take more than 2 hours and 50K requests to the europa search API.
To avoid so many requests, would it be possible to:
While the token likely expires after some time, we could implement a storage mechanism with an expiration logic (e.g., using expirable keyvalue?).
This would depend on the Europa search API's capabilities. I reached out to them, and they mentioned support for bulk indexing via
/rest/ingestion/bulk
. However, this endpoint seems specifically designed for documents (e.g., PDFs, Word files), so I'm not sure it supports our use case at this time.For context, search_api processes items in batches of 50 by default. This approach would make it similar to search_api_solr for example, which sends all 50 items in a single request.
Thank you
The text was updated successfully, but these errors were encountered: