Skip to content

Latest commit

 

History

History
64 lines (45 loc) · 1.92 KB

README.md

File metadata and controls

64 lines (45 loc) · 1.92 KB

mlhub

A python library for downloading data from https://www.mlhub.earth/#datasets

A lot of this code is copied from https://github.com/radiantearth/mlhub-tutorials, with a few additional functions for data exploration.

Quickstart

Install by downloading this repository, and (from the repository root) running

pip install -e .

To start, you will need to add your API token to interact with the mlhub API. In addition, if you want to download source images from AWS, you will need to add your AWS credentials. The buckets are Requester Pays buckets, which means you will get charged for each download.

import mlhub

mlhub.set_token(
    "token123"
)

mlhub.set_aws_credentials(
    aws_access_key="access123",
    aws_secret_key="secret123",
)

You can then explore which collections are available:

collections = mlhub.get_collections()

And download a collection

from pathlib import Path

download_location = Path("../data")

mlhub.download_collection(
    collection_id="ref_african_crops_tanzania_01",
    download_path=download_location,
    ignore_source_images=False
)

If ignore_source_images=True, then no source images are downloaded, only the labels and the documentation. If you are not downloading the source images, there is no need for any AWS credentials.

In addition, you can explore different collections, for instance by seeing the availability of source images across features and timesteps:

assets = mlhub.get_all_assets("ref_african_crops_uganda_01")
assets_df = mlhub.features_to_df(assets)
dates_df = mlhub.check_dates_across_features(assets_df)
mlhub.plot_range(dates_df)

Would yield the following plot:

plot_range Uganda

More of these plots can be found in the exploration notebook