Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gdi-userportal-dataset-discovery-service] Remove CKAN max rows limitation #104

Open
admy7 opened this issue Aug 27, 2024 · 0 comments
Open

Comments

@admy7
Copy link
Contributor

admy7 commented Aug 27, 2024

🎯 What? (Story Description)

  • Find a way to retrieve all the records from package_search

  • Apply it in CkanDatasetsIdCollector to retrieve all the dataset Ids from CKAN

💡 Why? (Justification)

In this service, we provide to the user the possibility to search datasets, eventually using different sources (e.g. CKAN, Beacon).

Internally, the first step is to find the ids of the datasets which match the user query, for each data source separately.
Then, we reconcile the different id sets by taking the intersection of those.
Consequently, the final intersection can be smaller than what the initial query was looking for.

In this regard, we would like to retrieve ALL the dataset ids for every data source, before the merge, to reduce as much as possible the chances of getting less records than inquired.
CKAN is problematic because it limits us to 1000 records maximum (by default).

🔨 Tasks (Breakdown)

  • Find a way to retrieve all the records from package_search

  • Apply it in CkanDatasetsIdCollector to retrieve all the dataset ids from CKAN

✅ Acceptance Criteria

Can we retrieve all the dataset ids from CKAN?

➕ Additional Information

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant