feat: api to get data extracts for the given aoi and category #960

nrjadkry · 2023-11-02T12:12:47Z

I have made an api which returns the data extracts available in the provided AOI( Area of Interest) from osm using raw-data-api.

This api accepts an aoi and the category for which data extracts are required as a parameter.

nrjadkry · 2023-11-07T02:44:20Z

@spwoodcock This works in case users uploads a single aoi.
What do you think would be better in case user uploads a geojson file with multi polygons. It wont be feasible if we extracts the data for all those polygons one by one.
I had used postgis query to generate a surroundings of those multipolygons in the previous approach, but here we want to perform this before creating a project.

spwoodcock · 2023-11-07T03:12:16Z

Could we load the geojson with shapely, then just get the bounds (bbox) of the features?

That would give us a single bounding geometry to pass to osm-rawdata.

nrjadkry · 2023-11-07T03:14:03Z

I guess, it might be okay, if I use convex_hull of those multipolygon and create a boundary,

… geojson

spwoodcock · 2023-11-07T03:47:49Z

Great work! Thanks @nrjadkry 🙏

The only thing I would like to do is test the performance - I assume it's reasonably quick to return the extract for good UX in the creation flow.

If so, this is definitely the best solution!

(I think raw-data-api creates a file in S3 for every request, but it's temporary and deleted after 90 days).

nrjadkry · 2023-11-07T03:58:14Z

@spwoodcock It does not have that much of a performance issue for a relatively small or medium sized polygons, It might take some time for large polygons.

spwoodcock · 2023-11-07T04:39:21Z

One small efficiency gain:

Currently you use execQuery with the geometry (which returns all unfiltered geoms) then filter it.
Instead you could pass a config yaml file when you create PostgresClient. One of the keys in the yaml file is filter, to return the filtered geoms immediately.

The yaml file details are in osm-rawdata and raw-data-api docs 👍

nrjadkry · 2023-11-07T05:43:34Z

I have passed the yaml file in the PostgresClient but they are the existing yaml files for the categories in the osm-fieldwork. Those yaml files do not have filter keys. So, I need to filter the data obtained as well.

spwoodcock · 2023-11-07T06:06:08Z

It's a different file than the one for osm-fieldwork.

See the raw-data-api docs.
The /snapshot endpoint accepts a filter.

Saying that, the docs actually show it as a JSON file, so I'm not sure if the YAML definition is valid yet.

spwoodcock · 2023-11-07T06:12:51Z

Looking into it, YAML definition is valid and handled by osm-rawdata.

I would use YAML over JSON.

You can create the YAML file using QueryConfig.

(Creating the file is inefficient, but there is an open issue to accept BytesIO objects instead in osm-rawdata in the todo list).

spwoodcock · 2023-11-07T06:19:02Z

Also, Rob generally has an example usage of the code in his __main__ functions for command line use.

It might help if I pull this out into a wrapper convenience function (so we can call via both code or CLI easily).

nrjadkry · 2023-11-07T08:31:32Z

Rob has done this in the __main__ function.

            pg = PostgresClient(args.uri, args.config)
            result = pg.execQuery(poly)

I have done the same in this too. Those filters are created inside the execQuery function by the osm-raw-data itself. We dont need to create the config yaml file here too.

spwoodcock · 2023-11-07T09:39:34Z

Yes you are right that execQuery is used to execute the query and return the result, so that is still required (I got that wrong above).

But the main function also has a config file passed in as a required argument.

We also need to pass a config file to filter the result directly, instead of using osm-fieldwork.filter_data.FilterData.

(this will do a filter on the underlying SQL query when the data is fetched, instead of fetching all data, then filtering, potentially reducing the web payload returned significantly)

spwoodcock · 2023-11-07T09:41:16Z

I will update osm-rawdata to accept a BytesIO config file in the coming days.

In the meantime we need to generate the config file on disk under /tmp as a workaround (being sure to delete it afterwards).

nrjadkry · 2023-11-07T10:52:27Z

I have passed the config file as well in this.

from osm_fieldwork.data_models import data_models_path
config_path = f"{data_models_path}/{category}.yaml"
pg = PostgresClient("underpass", config_path)
data_extract = pg.execQuery(boundary)

I have passed the config file from osm-fieldwork. Since, we just need to check if the data exists or not for the category.
I have removed the filter data process too. Is this okay?

spwoodcock · 2023-11-07T12:00:04Z

Looks good to me 🎉

Just passing the osm-fieldwork YAML config is the best option in this case, as it's just a preview 👍

When we create the actual data extract we would have to use JSON format, as it supports more params: https://hotosm.github.io/osm-rawdata/json/

While the YAML format only contains the filters, the JSON format has many other options: https://hotosm.github.io/raw-data-api/api/endpoints/#rawdatacurrentparams

feat: api to get data extracts for the given aoi and category

e0b71d8

nrjadkry linked an issue Nov 2, 2023 that may be closed by this pull request

Revision of project creation workflow : Visibility of data extracts in the project creation workflow #957

Closed

nrjadkry temporarily deployed to test November 2, 2023 12:12 — with GitHub Actions Inactive

github-actions bot added the backend Related to backend code label Nov 2, 2023

nrjadkry temporarily deployed to 960/merge November 2, 2023 12:13 — with GitHub Actions Inactive

nrjadkry temporarily deployed to test November 2, 2023 12:17 — with GitHub Actions Inactive

rearragned imports

e965977

nrjadkry temporarily deployed to test November 7, 2023 02:39 — with GitHub Actions Inactive

nrjadkry marked this pull request as ready for review November 7, 2023 02:39

used unary_union and convex_hull from shapely to process multipolygon…

96273f2

… geojson

nrjadkry temporarily deployed to test November 7, 2023 03:26 — with GitHub Actions Inactive

wrapped inside of try except block for view_data_extracts

ccc3553

nrjadkry temporarily deployed to test November 7, 2023 03:56 — with GitHub Actions Inactive

nrjadkry temporarily deployed to test November 7, 2023 03:57 — with GitHub Actions Inactive

filter data removed

ecb3824

nrjadkry temporarily deployed to test November 7, 2023 10:51 — with GitHub Actions Inactive

nrjadkry temporarily deployed to test November 7, 2023 10:52 — with GitHub Actions Inactive

spwoodcock merged commit 007f1e1 into development Nov 7, 2023
4 checks passed

spwoodcock deleted the 957-revision-of-project-creation-workflow branch November 7, 2023 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: api to get data extracts for the given aoi and category #960

feat: api to get data extracts for the given aoi and category #960

nrjadkry commented Nov 2, 2023

nrjadkry commented Nov 7, 2023

spwoodcock commented Nov 7, 2023

nrjadkry commented Nov 7, 2023

spwoodcock commented Nov 7, 2023

nrjadkry commented Nov 7, 2023

spwoodcock commented Nov 7, 2023 •

edited

Loading

nrjadkry commented Nov 7, 2023

spwoodcock commented Nov 7, 2023

spwoodcock commented Nov 7, 2023 •

edited

Loading

spwoodcock commented Nov 7, 2023

nrjadkry commented Nov 7, 2023

spwoodcock commented Nov 7, 2023

spwoodcock commented Nov 7, 2023

nrjadkry commented Nov 7, 2023

spwoodcock commented Nov 7, 2023

feat: api to get data extracts for the given aoi and category #960

feat: api to get data extracts for the given aoi and category #960

Conversation

nrjadkry commented Nov 2, 2023

nrjadkry commented Nov 7, 2023

spwoodcock commented Nov 7, 2023

nrjadkry commented Nov 7, 2023

spwoodcock commented Nov 7, 2023

nrjadkry commented Nov 7, 2023

spwoodcock commented Nov 7, 2023 • edited Loading

nrjadkry commented Nov 7, 2023

spwoodcock commented Nov 7, 2023

spwoodcock commented Nov 7, 2023 • edited Loading

spwoodcock commented Nov 7, 2023

nrjadkry commented Nov 7, 2023

spwoodcock commented Nov 7, 2023

spwoodcock commented Nov 7, 2023

nrjadkry commented Nov 7, 2023

spwoodcock commented Nov 7, 2023

spwoodcock commented Nov 7, 2023 •

edited

Loading

spwoodcock commented Nov 7, 2023 •

edited

Loading