Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve wording of the .md files #219

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 90 additions & 76 deletions FAQ.md
Original file line number Diff line number Diff line change
@@ -1,79 +1,93 @@
# H2O Driverless AI Bring Your Own Recipes
# H2O Driverless AI: Bring Your Own Recipes
## FAQ
#### Why do I need to bring my own recipes? Isn't Driverless AI smart enough of the box?
The only way to find out is to try. Most likely you'll be able to improve performance with custom recipes. Domain knowledge and intuition are essential to getting the best possible performance.

#### What are some example recipes?
* Look at the [examples in this repository](https://github.com/h2oai/driverlessai-recipes/blob/master/README.md#sample-recipes). Some illustrative samples:
* Transformer:
* Suppose you have a string column that has values like `"A:B:10:5", "A:C:4:10", ...`. It might make sense to split these values by ":" and create four output columns, potentially all numeric, such as `[0,1,10,5], [0,2,4,10], ...` to encode the information more clearly for the algorithm to learn better from.
* PyTorch deep learning model for [text similarity analysis](https://github.com/h2oai/driverlessai-recipes/blob/master/transformers/nlp/text_embedding_similarity_transformers.py), computes a similary score for any given two text input columns.
* ARIMA model for [time-series forecasting](https://github.com/h2oai/driverlessai-recipes/blob/master/transformers/timeseries/auto_arima_forecast.py).
* Data augmentation, such as replacing a zip code with demographic information, or replacing a date column with a [National holiday flag](https://github.com/h2oai/driverlessai-recipes/blob/master/transformers/augmentation/singapore_public_holidays.py).
* Model:
* All [H2O-3 Algorithms including H2O AutoML](https://github.com/h2oai/driverlessai-recipes/blob/master/models/algorithms/h2o-3-models.py)
* Yandex [CatBoost](https://github.com/h2oai/driverlessai-recipes/blob/master/models/algorithms/catboost.py) gradient boosting
* A custom loss function for [LightGBM](https://github.com/h2oai/driverlessai-recipes/blob/master/models/custom_loss/lightgbm_with_custom_loss.py) or [XGBoost](https://github.com/h2oai/driverlessai-recipes/blob/master/models/custom_loss/xgboost_with_custom_loss.py)
* Scorer:
* Maybe you want to optimize your predictions for the [top decile](https://github.com/h2oai/driverlessai-recipes/blob/master/scorers/regression/top_decile.py) for a regression problem..
* Maybe you care about the [false discovery rate](https://github.com/h2oai/driverlessai-recipes/blob/master/scorers/classification/binary/false_discovery_rate.py) for a binary classification problem.
* Explainer:
* Create custom recipes for model interpretability, fairness, robustness, explanations
* Create custom plots, charts, markdown reports, etc.

#### Driverless is good enough for me, I don't want to do recipes.
Perfect. Relax and sit back. We'll keep making Driverless AI better and better with every version, so you don't have to.
Several of the recipes in this repository will likely be included in future releases of Driverless AI out of the box, after more performance improvements and hardening.
#### What's in it for me if I write a recipe?
You will get better at doing data science and you will get better results. Writing code is essential to improving your data science skills. Especially when writing data science code. Recipes are perfect for that.
#### Who can make recipes?
Anyone who can or wants to. Mostly data scientists or developers. Some of the best recipes are trivial and make a big difference, like custom scorers.
#### What do I need to make a recipe?
A text editor. All you need is to create a `.py` text file containing source code.
#### How do I start?
* Examine the [references](https://github.com/h2oai/driverlessai-recipes#reference-guide) below for the API specification and architecture diagrams.
* Look at the [examples in this repository](https://github.com/h2oai/driverlessai-recipes/blob/master/README.md#sample-recipes).
* Clone this repository and make modifications to existing recipes.
* Start an experiment and upload the recipe in the expert settings of an experiment.
#### What version of Python does Driverless AI use?
Driverless AI uses Python version 3.6, so all custom recipes will run with Python 3.6 as well.
#### How do I know whether my recipe works?
Driverless AI will tell you whether it makes the cut:
* First, it is subjected to acceptance tests. If it passes, great. If not, Driverless AI provides you with feedback on how to improve it.
* Then, you can choose to include it in your experiment(s). It will decide which recipes are best suited to solve the problem. At worst, you can cause the experiment to slow down.
#### How can I debug my recipe?
* The easiest way (for now) is to keep uploading it to the expert settings in Driverless AI until the recipe is accepted.
* Another way is to do minimal changes as shown in [this debugging example](./transformers/how_to_debug_transformer.py) and use PyCharm or a similar Python debugger.
#### What happens if my recipe is rejected during upload?
* Read the entire error message, it most likely contains the stack trace and helpful information on how to fix the problem.
* If you can't figure out how to fix the recipe, we suggest you post your questions in the [Driverless AI community Slack channel](https://www.h2o.ai/community/driverless-ai-community/#chat)
* You can also send us your experiment logs zip file, which will contain the recipe source files.
#### What happens if my transformer recipe doesn't lead to the highest variable importance for the experiment?
That's nothing to worry about. It's unlikely that your features have the strongest signal of all features. Even 'magic' Kaggle grandmaster features don't usually make a massive difference, but they still beat most of the competition.
#### What happens if my recipe is not used at all by the experiment?
* Don't give up. You learned something.
* Check the logs for failures if unsure whether the recipe worked at all or not.
* Driverless AI will ignore recipe failures unless this robustness feature is specifically disabled. Under Expert Settings, disable `skip_transformer_failures` and `skip_model_failures` if you want to fail the experiment on any unexpected errors due to custom recipes.
* Inside the experiment logs zip file, there's a folder called `details` and if it contains `.stack` files with stacktraces referring to your custom code, then you know it bombed.
#### Can I write recipes in Go, C++, Java or R?
If you can hook it up to Python, then yes. We have many recipes that use Java and C++ backends. Most of Driverless AI uses C++ backends.
#### Is there a difference between a custom recipe and the recipes shipped with Driverless AI?
No. Same code base. No performance penalty. No calling overhead. Same inputs and outputs.
#### Why are some models implemented as transformers?
Separating of work. With the transformer API, we can replace *only* the particular input column(s) with out-of-fold estimates of the target column. All other columns (features) can be processed by other transformers. The combined union of all features is then passed to the model(s) which can yield higher accuracy than a model that only sees the particular input column(s). For more information about the flow of data, see the technical [references](https://github.com/h2oai/driverlessai-recipes#reference-guide) section.
#### How can I control which custom recipes are active, and how can I disable all custom recipes?
Recipes are meant to be built by people you trust and each recipe should be code-reviewed before going to production. If you don't want custom code to be executed by Driverless AI, set `enable_custom_recipes=false` in the config.toml, or add the environment variable `DRIVERLESS_AI_ENABLE_CUSTOM_RECIPES=0` at startup of Driverless AI. This will disable all custom transformers, models and scorers. If you want to keep all previously uploaded recipes enabled and disable the upload of any new recipes, set `enable_custom_recipes_upload=false` or `DRIVERLESS_AI_ENABLE_CUSTOM_RECIPES_UPLOAD=0` at startup of Driverless AI.
#### What if I keep changing the same recipe over and over?
If you upload a new version of a recipe, it will become the new default version for that recipe. Previously run experiments using older versions of that recipe will continue to work, and use the older version. New experiments will use the new version.
#### Who can see my recipe?
Everyone with access to the Driverless AI instance can run all recipes, even if they were uploaded by someone else. Recipes remains on the instance that runs Driverless AI. Experiment logs may contain relevant information about your recipes (such as their source code), so double-check before you share them.
#### How do I delete all recipes on my instance?
If you really need to delete all recipes, you can delete the `contrib` folder inside the `data_directory` (usually called `tmp`) and restart Driverless AI. Caution: Previously created experiments using custom recipes will not be able to make predictions any longer, so this is not recommended unless you also delete all related experiments as well.
#### Are MOJOs supported for experiments that use custom recipes?
In most cases (especially for complex recipes), MOJOs won’t be available out of the box. But, it is possible to get the MOJO. Contact [email protected] for more information about creating MOJOs for custom recipes. (**Note**: The Python Scoring Pipeline features full support for custom recipes.)
#### How do I share my recipe with the world?
We encourage you to share your recipe in this repository. If your recipe works, please make a pull request and improve the experience for everyone!


### Why do I need to bring my own recipes?
Custom recipes can improve performance. Domain knowledge and intuition are essential for achieving optimal results.

### What are some example recipes?
See the [examples in this repository](https://github.com/h2oai/driverlessai-recipes/blob/master/README.md#sample-recipes). Examples include:

- **Transformer**:
- Split string columns into multiple numeric columns (e.g., `"A:B:10:5"` becomes `[0,1,10,5]`).
- **PyTorch** deep learning model for [text similarity analysis](https://github.com/h2oai/driverlessai-recipes/blob/master/transformers/nlp/text_embedding_similarity_transformers.py).
- **ARIMA** model for [time-series forecasting](https://github.com/h2oai/driverlessai-recipes/blob/master/transformers/timeseries/auto_arima_forecast.py).
- **Data augmentation**, like replacing zip codes with demographic info or using a [national holiday flag](https://github.com/h2oai/driverlessai-recipes/blob/master/transformers/augmentation/singapore_public_holidays.py).

- **Model**:
- H2O-3 [algorithms](https://github.com/h2oai/driverlessai-recipes/blob/master/models/algorithms/h2o-3-models.py), including H2O AutoML.
- **Yandex [CatBoost](https://github.com/h2oai/driverlessai-recipes/blob/master/models/algorithms/catboost.py)** gradient boosting.
- Custom loss functions for [LightGBM](https://github.com/h2oai/driverlessai-recipes/blob/master/models/custom_loss/lightgbm_with_custom_loss.py) or [XGBoost](https://github.com/h2oai/driverlessai-recipes/blob/master/models/custom_loss/xgboost_with_custom_loss.py).

- **Scorer**:
- Optimize for the [top decile](https://github.com/h2oai/driverlessai-recipes/blob/master/scorers/regression/top_decile.py) in regression tasks.
- Improve the [false discovery rate](https://github.com/h2oai/driverlessai-recipes/blob/master/scorers/classification/binary/false_discovery_rate.py) for binary classification.

- **Explainer**:
- Create custom recipes for model interpretability, fairness, robustness, and explanations.
- Generate custom plots, charts, markdown reports, and more.

### Is H2O Driverless AI sufficient without custom recipes?
H2O Driverless AI continues to improve with each version, so you may not need custom recipes. However, adding your own recipes can optimize performance for specific use cases.

### What's in it for me if I write a recipe?
Writing recipes improves your data science skills and helps achieve better results. It is one of the best ways to enhance your expertise.

### Who can write recipes?
Anyone with the necessary expertise can contribute. Data scientists and developers typically write recipes, though even simple recipes can have a significant impact.

### What do I need to write a recipe?
A text editor and knowledge of Python. Recipes are written as `.py` files with the source code.

### How do I start?
- Review the [API specifications](https://github.com/h2oai/driverlessai-recipes#reference-guide) and architecture diagrams.
- Review the [examples in this repository](https://github.com/h2oai/driverlessai-recipes/blob/master/README.md#sample-recipes).
- Upload your recipe in the Expert Settings section during the experiment setup.

### What version of Python does H2O Driverless AI use?
H2O Driverless AI uses Python 3.11. Ensure your recipes are compatible with this version.

### How do I know if my recipe works?
H2O Driverless AI will notify you whether your recipe passes the acceptance tests. If it fails, feedback will guide you on how to fix it.

### How can I debug my recipe?
Upload your recipe to the Expert Settings and use the experiment log for debugging. Alternatively, make minimal changes as shown in [this debugging example](./transformers/how_to_debug_transformer.py) and debug with a Python debugger, like PyCharm.

### What happens if my recipe is rejected during upload?
Review the error message, which usually includes a stack trace and hints for fixing the issue. If you need help, ask questions in the [H2O Driverless AI community Slack channel](https://www.h2o.ai/community/driverless-ai-community/#chat). You can also send your experiment logs zip file, which will contain the recipe source files.

### What if my transformer recipe doesn't lead to the highest variable importance?
Features created by your transformer might not have the strongest signal, but they can still improve the overall model performance.

### What happens if my recipe is not used in the experiment?
H2O Driverless AI will use the best-performing recipes. Check the experiment logs for errors related to your recipe. You can also disable recipe failures in Expert Settings.

### Can I write recipes in Go, C++, Java, or R?
You can use any language as long as you can interface it with Python. Many recipes rely on Java and C++ backends.

### Is there a difference between custom recipes and those shipped with H2O Driverless AI?
Custom recipes are treated the same as built-in recipes. There is no performance penalty or calling overhead.

### Why are some models implemented as transformers?
The transformer API allows flexibility. For example, transformers can process specific input columns while leaving others unchanged, resulting in improved accuracy.

### How can I control which custom recipes are active? How can I disable all custom recipes?
Recipes can be disabled by setting `enable_custom_recipes=false` in the `config.toml` file or using the `DRIVERLESS_AI_ENABLE_CUSTOM_RECIPES=0` environment variable. To disable uploading new recipes, set `enable_custom_recipes_upload=false` or `DRIVERLESS_AI_ENABLE_CUSTOM_RECIPES_UPLOAD=0`.

### What if I keep changing the same recipe?
When you upload a new version of a recipe, it becomes the default. Older experiments will continue using the previous version.

### Who can see my recipe?
Anyone with access to the H2O Driverless AI instance can run any uploaded recipe, but recipes are shared only within the instance.

### How do I delete all recipes on my instance?
To delete all recipes, remove the `contrib` folder from the data directory (usually `tmp`) and restart H2O Driverless AI. This will prevent old experiments from making predictions unless related experiments are also deleted.

### Are MOJOs supported for experiments that use custom recipes?
In most cases, MOJOs are not available for custom recipes. Contact [email protected] for more details.

### How do I share my recipe with the community?
Contribute to this repository by making a pull request. If your recipe works, it can help others optimize their experiments.

## References
### Custom Transformers
* sklearn API
Expand All @@ -98,4 +112,4 @@
![BYOR Architecture Diagram](reference/DriverlessAI_BYOR.png)
### Webinar
Webinar: [Extending the H2O Driverless AI Platform with Your Recipes](https://www.brighttalk.com/webcast/16463/360533/extending-the-h2o-driverless-ai-platform-with-your-recipes)
Website: [H2O Driverless AI Recipes](https://www.h2o.ai/products-h2o-driverless-ai-recipes/)
Website: [H2O Driverless AI Recipes](https://www.h2o.ai/products-h2o-driverless-ai-recipes/)
Loading
Loading