Documentation for EO newbies #95

dks4-hw · 2022-07-12T19:15:16Z

This looks like a terrific ML resource with a powerful API. But your documentation is a bit lean, especially for EO newbies. The map in README.md suggests there is terrific image coverage in the dataset of Europe and North America, but the example code is limited to Togo, with benchmarks for Kenya & Brazil.
Can we use cropharvest to feed data for Europe or North America to ML models? I am guessing we need to supplement the features data download with those features in geographies we want to perform ML on. How do we use cropharvest to do that? It is not obvious.
Forgive me if the dataset is intended only for Kenya/Brazil/Togo only and I have misunderstood. As EO professionals you will be familiar with the sentinelsat library whose documentation is brilliant for EO newbies but does not produce ML ready products. Could you produce something as explanatory but with a ML ready output?

dks4-hw · 2022-07-13T07:04:28Z

Meant to tag @gabrieltseng

gabrieltseng · 2022-07-15T15:48:46Z

Hi @dks4-hw ,

Thanks for the feedback! I'll work on adding some better documentation. In the meantime, to help you get started:

All the data is accessible through the cropharvest.datasets.CropHarvest object. The main parameters which you might be interested in manipulating are controllable through a cropharvest.datasets.Task, which takes as input the following parameters:

A bounding box, which defines the spatial boundaries of the labels retrieves
A target label, which defines the class of the positive labels (if this is left to None, then the positive class will be crops and the negative class will be non-crops)
A boolean defining whether or not to balance the crops and non-crops in the negative class
A test_identifier string, which tells the dataset whether or not to retrieve a file from the test_features folder and return it as the test data.

So if I wanted to use this to train a model to identify crop vs. non crop in France, I might do it like this:

from sklearn.ensemble import RandomForestClassifier

from cropharvest.datasets import Task, CropHarvest
from cropharvest.countries import get_country_bbox

my_dataset = CropHarvest(
    # the first argument to the dataset is the (already existing)
    # folder into which the data will be downloaded / already exists
    "data",
    Task(
        # get_country_bbox returns a list of bounding boxes.
        # the one representing Metropolitan France is the
        # 2nd box
        bounding_box=get_country_bbox("France")[1],
        normalize=True
    )   
)
X, y = my_dataset.as_array(flatten_x=True)
model = RandomForestClassifier(random_state=0)
model.fit(X, y)

I hope this helps to get started; in the meantime, I'll write up some more thorough documentation.

kolrocket · 2022-10-25T16:22:49Z

Hello, I'm trying to run this exact example. But after

my_dataset = CropHarvest( 
    Task(
        # get_country_bbox returns a list of bounding boxes
        bounding_box=get_country_bbox("France")[0],
        normalize=True
    )   
)

it returns

Traceback (most recent call last):

  File "C:\Users\leand\AppData\Local\Temp\ipykernel_19248\3361455196.py", line 1, in <module>
    my_dataset = CropHarvest(

  File "C:\Users\leand\anaconda3\envs\crop\lib\site-packages\cropharvest\datasets.py", line 203, in __init__
    super().__init__(root, download, filenames=(FEATURES_DIR, TEST_FEATURES_DIR))

  File "C:\Users\leand\anaconda3\envs\crop\lib\site-packages\cropharvest\datasets.py", line 60, in __init__
    self.root = Path(root)

  File "C:\Users\leand\anaconda3\envs\crop\lib\pathlib.py", line 1042, in __new__
    self = cls._from_parts(args, init=False)

  File "C:\Users\leand\anaconda3\envs\crop\lib\pathlib.py", line 683, in _from_parts
    drv, root, parts = self._parse_args(args)

  File "C:\Users\leand\anaconda3\envs\crop\lib\pathlib.py", line 667, in _parse_args
    a = os.fspath(a)

TypeError: expected str, bytes or os.PathLike object, not Task

Any ideas of what is wrong? Thank you very much.

EDIT: runing Task(...) instead of CropHarvest(Task()) works and returns:

Task(bounding_box=BBox(min_lat=41.384912109374994, max_lat=43.021484375, min_lon=8.565625000000011, max_lon=9.556445312500017, name='France_0'), target_label='crop', balance_negative_crops=False, test_identifier=None, normalize=True)

but then for the next part 'Task' object has no attribute 'as_array' .

gabrieltseng · 2022-10-25T16:52:04Z

Hi @kolrocket ; apologies. There was a bug in the example above, which is now fixed. I've confirmed the code runs:

>>> from sklearn.ensemble import RandomForestClassifier
>>> from cropharvest.datasets import Task, CropHarvest
>>> from cropharvest.countries import get_country_bbox
>>> my_dataset = CropHarvest("data", Task(bounding_box=get_country_bbox("France")[1], normalize=True))
>>> X, y = my_dataset.as_array(flatten_x=True)
>>> X.shape, y.shape
((6603, 216), (6603,))
>>> model = RandomForestClassifier(random_state=0)
>>> model.fit(X, y)
RandomForestClassifier(random_state=0)

kolrocket · 2022-10-25T19:06:25Z

Thank you again!

gabrieltseng self-assigned this Jul 15, 2022

gabrieltseng added the question Further information is requested label Oct 3, 2022

gabrieltseng mentioned this issue Oct 26, 2022

Update README and new release #109

Merged

4 tasks

yichiac mentioned this issue Oct 12, 2023

EO_data Border Issue and Example Code Error #126

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation for EO newbies #95

Documentation for EO newbies #95

dks4-hw commented Jul 12, 2022

dks4-hw commented Jul 13, 2022

gabrieltseng commented Jul 15, 2022 •

edited

Loading

kolrocket commented Oct 25, 2022 •

edited

Loading

gabrieltseng commented Oct 25, 2022

kolrocket commented Oct 25, 2022

Documentation for EO newbies #95

Documentation for EO newbies #95

Comments

dks4-hw commented Jul 12, 2022

dks4-hw commented Jul 13, 2022

gabrieltseng commented Jul 15, 2022 • edited Loading

kolrocket commented Oct 25, 2022 • edited Loading

gabrieltseng commented Oct 25, 2022

kolrocket commented Oct 25, 2022

gabrieltseng commented Jul 15, 2022 •

edited

Loading

kolrocket commented Oct 25, 2022 •

edited

Loading