-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Machine Learning for openEO #441
Open
m-mohr
wants to merge
9
commits into
draft
Choose a base branch
from
ml
base: draft
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 8 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
158c244
Reddd ML processes for 2.1.0 #416
m-mohr 7566020
Make predict processes for ML more general #368
m-mohr 8e85e86
Predict DL proposal
m-mohr 5e27496
Variant 2?
m-mohr 6baf737
Wording improvements from #396
m-mohr 1fc4a8e
Rename processes according to recent discussions from #396
m-mohr c8a7e38
Remove single value predictions and merge ml and dl
m-mohr d275d78
load_ml_model: Change from id to uri & fix reference
m-mohr b162040
Merge branch 'draft' into ml
PondiB File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
{ | ||
"id": "load_ml_model", | ||
"summary": "Load a ML model", | ||
"description": "Loads a machine learning model from a STAC Item.\n\nSuch a model could be trained and saved as part of a previous batch job with processes such as ``ml_fit_regr_random_forest()`` and ``save_ml_model()``.", | ||
"categories": [ | ||
"machine learning", | ||
"import" | ||
], | ||
"experimental": true, | ||
"parameters": [ | ||
{ | ||
"name": "uri", | ||
"description": "The STAC Item to load the machine learning model from. The STAC Item must implement the `ml-model` extension.", | ||
"schema": [ | ||
{ | ||
"title": "URL", | ||
"type": "string", | ||
"format": "uri", | ||
"subtype": "uri", | ||
"pattern": "^https?://" | ||
}, | ||
{ | ||
"title": "User-uploaded File", | ||
"type": "string", | ||
"subtype": "file-path", | ||
"pattern": "^[^\r\n\\:'\"]+$" | ||
} | ||
] | ||
} | ||
], | ||
"returns": { | ||
"description": "A machine learning model to be used with machine learning processes such as ``ml_predict()``.", | ||
"schema": { | ||
"type": "object", | ||
"subtype": "ml-model" | ||
} | ||
}, | ||
"links": [ | ||
{ | ||
"href": "https://github.com/stac-extensions/ml-model", | ||
"title": "STAC ml-model extension", | ||
"type": "text/html", | ||
"rel": "about" | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
{ | ||
"id": "ml_fit_class_random_forest", | ||
"summary": "Train a random forest classification model", | ||
"description": "Executes the fit of a random forest classification based on training data. The process does not include a separate split of the data in test, validation and training data. The Random Forest classification model is based on the approach by Breiman (2001).", | ||
"categories": [ | ||
"machine learning" | ||
], | ||
"experimental": true, | ||
"parameters": [ | ||
{ | ||
"name": "predictors", | ||
"description": "The predictors for the classification model as a vector data cube. Aggregated to the features (vectors) of the target input variable.", | ||
"schema": [ | ||
{ | ||
"type": "object", | ||
"subtype": "datacube", | ||
"dimensions": [ | ||
{ | ||
"type": "geometry" | ||
}, | ||
{ | ||
"type": "bands" | ||
} | ||
] | ||
}, | ||
{ | ||
"type": "object", | ||
"subtype": "datacube", | ||
"dimensions": [ | ||
{ | ||
"type": "geometry" | ||
}, | ||
{ | ||
"type": "other" | ||
} | ||
] | ||
} | ||
] | ||
}, | ||
{ | ||
"name": "target", | ||
"description": "The training sites for the classification model as a vector data cube. This is associated with the target variable for the Random Forest model. The geometry has to associated with a value to predict (e.g. fractional forest canopy cover).", | ||
"schema": { | ||
"type": "object", | ||
"subtype": "datacube", | ||
"dimensions": [ | ||
{ | ||
"type": "geometry" | ||
} | ||
] | ||
} | ||
}, | ||
{ | ||
"name": "max_variables", | ||
"description": "Specifies how many split variables will be used at a node.\n\nThe following options are available:\n\n- *integer*: The given number of variables are considered for each split.\n- `all`: All variables are considered for each split.\n- `log2`: The logarithm with base 2 of the number of variables are considered for each split.\n- `onethird`: A third of the number of variables are considered for each split.\n- `sqrt`: The square root of the number of variables are considered for each split. This is often the default for classification.", | ||
"schema": [ | ||
{ | ||
"type": "integer", | ||
"minimum": 1 | ||
}, | ||
{ | ||
"type": "string", | ||
"enum": [ | ||
"all", | ||
"log2", | ||
"onethird", | ||
"sqrt" | ||
] | ||
} | ||
] | ||
}, | ||
{ | ||
"name": "num_trees", | ||
"description": "The number of trees build within the Random Forest classification.", | ||
"optional": true, | ||
"default": 100, | ||
"schema": { | ||
"type": "integer", | ||
"minimum": 1 | ||
} | ||
}, | ||
{ | ||
"name": "seed", | ||
"description": "A randomization seed to use for the random sampling in training. If not given or `null`, no seed is used and results may differ on subsequent use.", | ||
"optional": true, | ||
"default": null, | ||
"schema": { | ||
"type": [ | ||
"integer", | ||
"null" | ||
] | ||
} | ||
} | ||
], | ||
"returns": { | ||
"description": "A model object that can be saved with ``save_ml_model()`` and restored with ``load_ml_model()``.", | ||
"schema": { | ||
"type": "object", | ||
"subtype": "ml-model" | ||
} | ||
}, | ||
"links": [ | ||
{ | ||
"href": "https://doi.org/10.1023/A:1010933404324", | ||
"title": "Breiman (2001): Random Forests", | ||
"type": "text/html", | ||
"rel": "about" | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
{ | ||
"id": "ml_fit_regr_random_forest", | ||
"summary": "Train a random forest regression model", | ||
"description": "Executes the fit of a random forest regression based on training data. The process does not include a separate split of the data in test, validation and training data. The Random Forest regression model is based on the approach by Breiman (2001).", | ||
"categories": [ | ||
"machine learning" | ||
], | ||
"experimental": true, | ||
"parameters": [ | ||
{ | ||
"name": "predictors", | ||
"description": "The predictors for the regression model as a vector data cube. Aggregated to the features (vectors) of the target input variable.", | ||
"schema": [ | ||
{ | ||
"type": "object", | ||
"subtype": "datacube", | ||
"dimensions": [ | ||
{ | ||
"type": "geometry" | ||
}, | ||
{ | ||
"type": "bands" | ||
} | ||
] | ||
}, | ||
{ | ||
"type": "object", | ||
"subtype": "datacube", | ||
"dimensions": [ | ||
{ | ||
"type": "geometry" | ||
}, | ||
{ | ||
"type": "other" | ||
} | ||
] | ||
} | ||
] | ||
}, | ||
{ | ||
"name": "target", | ||
"description": "The training sites for the regression model as a vector data cube. This is associated with the target variable for the Random Forest model. The geometry has to associated with a value to predict (e.g. fractional forest canopy cover).", | ||
"schema": { | ||
"type": "object", | ||
"subtype": "datacube", | ||
"dimensions": [ | ||
{ | ||
"type": "geometry" | ||
} | ||
] | ||
} | ||
}, | ||
{ | ||
"name": "max_variables", | ||
"description": "Specifies how many split variables will be used at a node.\n\nThe following options are available:\n\n- *integer*: The given number of variables are considered for each split.\n- `all`: All variables are considered for each split.\n- `log2`: The logarithm with base 2 of the number of variables are considered for each split.\n- `onethird`: A third of the number of variables are considered for each split. This is often the default for regression.\n- `sqrt`: The square root of the number of variables are considered for each split.", | ||
"schema": [ | ||
{ | ||
"type": "integer", | ||
"minimum": 1 | ||
}, | ||
{ | ||
"type": "string", | ||
"enum": [ | ||
"all", | ||
"log2", | ||
"onethird", | ||
"sqrt" | ||
] | ||
} | ||
] | ||
}, | ||
{ | ||
"name": "num_trees", | ||
"description": "The number of trees build within the Random Forest regression.", | ||
"optional": true, | ||
"default": 100, | ||
"schema": { | ||
"type": "integer", | ||
"minimum": 1 | ||
} | ||
}, | ||
{ | ||
"name": "seed", | ||
"description": "A randomization seed to use for the random sampling in training. If not given or `null`, no seed is used and results may differ on subsequent use.", | ||
"optional": true, | ||
"default": null, | ||
"schema": { | ||
"type": [ | ||
"integer", | ||
"null" | ||
] | ||
} | ||
} | ||
], | ||
"returns": { | ||
"description": "A model object that can be saved with ``save_ml_model()`` and restored with ``load_ml_model()``.", | ||
"schema": { | ||
"type": "object", | ||
"subtype": "ml-model" | ||
} | ||
}, | ||
"links": [ | ||
{ | ||
"href": "https://doi.org/10.1023/A:1010933404324", | ||
"title": "Breiman (2001): Random Forests", | ||
"type": "text/html", | ||
"rel": "about" | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
{ | ||
"id": "ml_predict", | ||
"summary": "Predict using ML", | ||
"description": "Applies a machine learning model to a data cube of input features and returns the predicted values.", | ||
"categories": [ | ||
"machine learning" | ||
], | ||
"experimental": true, | ||
"parameters": [ | ||
{ | ||
"name": "data", | ||
"description": "The data cube containing the input features.", | ||
"schema": { | ||
"type": "object", | ||
"subtype": "datacube" | ||
} | ||
}, | ||
{ | ||
"name": "model", | ||
"description": "A ML model that was trained with one of the ML training processes such as ``ml_fit_regr_random_forest()``.", | ||
"schema": { | ||
"type": "object", | ||
"subtype": "ml-model" | ||
} | ||
}, | ||
{ | ||
"name": "dimensions", | ||
"description": "Zero or more dimensions that will be reduced by the model. Fails with a `DimensionNotAvailable` exception if one of the specified dimensions does not exist.", | ||
"schema": { | ||
"type": "array", | ||
"items": { | ||
"type": "string" | ||
} | ||
} | ||
} | ||
], | ||
"returns": { | ||
"description": "A data cube with the predicted values. It removes the specified dimensions and adds new dimension for the predicted values. It has the name `predictions` and is of type `other`. If a single value is returned, the dimension has a single label with name `0`.", | ||
"schema": { | ||
"type": "object", | ||
"subtype": "datacube", | ||
"dimensions": [ | ||
{ | ||
"type": "other" | ||
} | ||
] | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
{ | ||
"id": "save_ml_model", | ||
"summary": "Save a ML model", | ||
"description": "Saves a machine learning model as part of a batch job.\n\nThe model will be accompanied by a separate STAC Item that implements the [ml-model extension](https://github.com/stac-extensions/ml-model).", | ||
"categories": [ | ||
"machine learning", | ||
"import" | ||
], | ||
"experimental": true, | ||
"parameters": [ | ||
{ | ||
"name": "data", | ||
"description": "The data to store as a machine learning model.", | ||
"schema": { | ||
"type": "object", | ||
"subtype": "ml-model" | ||
} | ||
}, | ||
{ | ||
"name": "options", | ||
"description": "Additional parameters to create the file(s).", | ||
"schema": { | ||
"type": "object", | ||
"additionalParameters": false | ||
}, | ||
"default": {}, | ||
"optional": true | ||
} | ||
], | ||
"returns": { | ||
"description": "Returns `false` if the process failed to store the model, `true` otherwise.", | ||
"schema": { | ||
"type": "boolean" | ||
} | ||
}, | ||
"links": [ | ||
{ | ||
"href": "https://github.com/stac-extensions/ml-model", | ||
"title": "STAC ml-model extension", | ||
"type": "text/html", | ||
"rel": "about" | ||
} | ||
] | ||
} |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "accompanied" practically mean? Should there be an additional job result asset? Or should this be an job result link item?
The reason I'm asking is that we want to streamline the detection of the model's URL at the client side.
e.g. see Open-EO/openeo-python-client#576 we we currently have a highly implementation-specific hack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I guess we should clarify that.
On the other hand, please note that this PR is implicitly outdated as the ML Model extension in STAC is likely going to be replaced by another extension. So this generally needs more work (which I have no plans to do anytime soon).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you point to the new one @m-mohr ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/crim-ca/mlm-extension