Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: api to get data extracts for the given aoi and category #960

Merged
merged 5 commits into from
Nov 7, 2023

Conversation

nrjadkry
Copy link
Member

@nrjadkry nrjadkry commented Nov 2, 2023

I have made an api which returns the data extracts available in the provided AOI( Area of Interest) from osm using raw-data-api.

This api accepts an aoi and the category for which data extracts are required as a parameter.

image

@nrjadkry nrjadkry temporarily deployed to test November 2, 2023 12:12 — with GitHub Actions Inactive
@github-actions github-actions bot added the backend Related to backend code label Nov 2, 2023
@nrjadkry nrjadkry temporarily deployed to test November 2, 2023 12:17 — with GitHub Actions Inactive
@nrjadkry nrjadkry marked this pull request as ready for review November 7, 2023 02:39
@nrjadkry
Copy link
Member Author

nrjadkry commented Nov 7, 2023

@spwoodcock This works in case users uploads a single aoi.
What do you think would be better in case user uploads a geojson file with multi polygons. It wont be feasible if we extracts the data for all those polygons one by one.
I had used postgis query to generate a surroundings of those multipolygons in the previous approach, but here we want to perform this before creating a project.

@spwoodcock
Copy link
Member

Could we load the geojson with shapely, then just get the bounds (bbox) of the features?

That would give us a single bounding geometry to pass to osm-rawdata.

@nrjadkry
Copy link
Member Author

nrjadkry commented Nov 7, 2023

I guess, it might be okay, if I use convex_hull of those multipolygon and create a boundary,

@spwoodcock
Copy link
Member

Great work! Thanks @nrjadkry 🙏

The only thing I would like to do is test the performance - I assume it's reasonably quick to return the extract for good UX in the creation flow.

If so, this is definitely the best solution!

(I think raw-data-api creates a file in S3 for every request, but it's temporary and deleted after 90 days).

@nrjadkry
Copy link
Member Author

nrjadkry commented Nov 7, 2023

@spwoodcock It does not have that much of a performance issue for a relatively small or medium sized polygons, It might take some time for large polygons.

@spwoodcock
Copy link
Member

spwoodcock commented Nov 7, 2023

One small efficiency gain:

  • Currently you use execQuery with the geometry (which returns all unfiltered geoms) then filter it.

  • Instead you could pass a config yaml file when you create PostgresClient. One of the keys in the yaml file is filter, to return the filtered geoms immediately.

The yaml file details are in osm-rawdata and raw-data-api docs 👍

@nrjadkry
Copy link
Member Author

nrjadkry commented Nov 7, 2023

I have passed the yaml file in the PostgresClient but they are the existing yaml files for the categories in the osm-fieldwork. Those yaml files do not have filter keys. So, I need to filter the data obtained as well.

@spwoodcock
Copy link
Member

It's a different file than the one for osm-fieldwork.

See the raw-data-api docs.
The /snapshot endpoint accepts a filter.

Saying that, the docs actually show it as a JSON file, so I'm not sure if the YAML definition is valid yet.

@spwoodcock
Copy link
Member

spwoodcock commented Nov 7, 2023

Looking into it, YAML definition is valid and handled by osm-rawdata.

I would use YAML over JSON.

You can create the YAML file using QueryConfig.

(Creating the file is inefficient, but there is an open issue to accept BytesIO objects instead in osm-rawdata in the todo list).

@spwoodcock
Copy link
Member

Also, Rob generally has an example usage of the code in his __main__ functions for command line use.

It might help if I pull this out into a wrapper convenience function (so we can call via both code or CLI easily).

@nrjadkry
Copy link
Member Author

nrjadkry commented Nov 7, 2023

Rob has done this in the __main__ function.

            pg = PostgresClient(args.uri, args.config)
            result = pg.execQuery(poly)

I have done the same in this too. Those filters are created inside the execQuery function by the osm-raw-data itself. We dont need to create the config yaml file here too.

@spwoodcock
Copy link
Member

Yes you are right that execQuery is used to execute the query and return the result, so that is still required (I got that wrong above).

But the main function also has a config file passed in as a required argument.

We also need to pass a config file to filter the result directly, instead of using osm-fieldwork.filter_data.FilterData.

(this will do a filter on the underlying SQL query when the data is fetched, instead of fetching all data, then filtering, potentially reducing the web payload returned significantly)

@spwoodcock
Copy link
Member

I will update osm-rawdata to accept a BytesIO config file in the coming days.

In the meantime we need to generate the config file on disk under /tmp as a workaround (being sure to delete it afterwards).

@nrjadkry
Copy link
Member Author

nrjadkry commented Nov 7, 2023

I have passed the config file as well in this.

from osm_fieldwork.data_models import data_models_path
config_path = f"{data_models_path}/{category}.yaml"
pg = PostgresClient("underpass", config_path)
data_extract = pg.execQuery(boundary)

I have passed the config file from osm-fieldwork. Since, we just need to check if the data exists or not for the category.
I have removed the filter data process too. Is this okay?

@spwoodcock
Copy link
Member

Looks good to me 🎉

Just passing the osm-fieldwork YAML config is the best option in this case, as it's just a preview 👍

When we create the actual data extract we would have to use JSON format, as it supports more params: https://hotosm.github.io/osm-rawdata/json/

While the YAML format only contains the filters, the JSON format has many other options: https://hotosm.github.io/raw-data-api/api/endpoints/#rawdatacurrentparams

@spwoodcock spwoodcock merged commit 007f1e1 into development Nov 7, 2023
4 checks passed
@spwoodcock spwoodcock deleted the 957-revision-of-project-creation-workflow branch November 7, 2023 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to backend code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revision of project creation workflow : Visibility of data extracts in the project creation workflow
2 participants