diff --git a/FAQ.md b/FAQ.md index da1c1b28..cb73245f 100644 --- a/FAQ.md +++ b/FAQ.md @@ -1,79 +1,93 @@ -# H2O Driverless AI Bring Your Own Recipes +# H2O Driverless AI: Bring Your Own Recipes ## FAQ - #### Why do I need to bring my own recipes? Isn't Driverless AI smart enough of the box? - The only way to find out is to try. Most likely you'll be able to improve performance with custom recipes. Domain knowledge and intuition are essential to getting the best possible performance. - - #### What are some example recipes? - * Look at the [examples in this repository](https://github.com/h2oai/driverlessai-recipes/blob/master/README.md#sample-recipes). Some illustrative samples: - * Transformer: - * Suppose you have a string column that has values like `"A:B:10:5", "A:C:4:10", ...`. It might make sense to split these values by ":" and create four output columns, potentially all numeric, such as `[0,1,10,5], [0,2,4,10], ...` to encode the information more clearly for the algorithm to learn better from. - * PyTorch deep learning model for [text similarity analysis](https://github.com/h2oai/driverlessai-recipes/blob/master/transformers/nlp/text_embedding_similarity_transformers.py), computes a similary score for any given two text input columns. - * ARIMA model for [time-series forecasting](https://github.com/h2oai/driverlessai-recipes/blob/master/transformers/timeseries/auto_arima_forecast.py). - * Data augmentation, such as replacing a zip code with demographic information, or replacing a date column with a [National holiday flag](https://github.com/h2oai/driverlessai-recipes/blob/master/transformers/augmentation/singapore_public_holidays.py). - * Model: - * All [H2O-3 Algorithms including H2O AutoML](https://github.com/h2oai/driverlessai-recipes/blob/master/models/algorithms/h2o-3-models.py) - * Yandex [CatBoost](https://github.com/h2oai/driverlessai-recipes/blob/master/models/algorithms/catboost.py) gradient boosting - * A custom loss function for [LightGBM](https://github.com/h2oai/driverlessai-recipes/blob/master/models/custom_loss/lightgbm_with_custom_loss.py) or [XGBoost](https://github.com/h2oai/driverlessai-recipes/blob/master/models/custom_loss/xgboost_with_custom_loss.py) - * Scorer: - * Maybe you want to optimize your predictions for the [top decile](https://github.com/h2oai/driverlessai-recipes/blob/master/scorers/regression/top_decile.py) for a regression problem.. - * Maybe you care about the [false discovery rate](https://github.com/h2oai/driverlessai-recipes/blob/master/scorers/classification/binary/false_discovery_rate.py) for a binary classification problem. - * Explainer: - * Create custom recipes for model interpretability, fairness, robustness, explanations - * Create custom plots, charts, markdown reports, etc. - - #### Driverless is good enough for me, I don't want to do recipes. - Perfect. Relax and sit back. We'll keep making Driverless AI better and better with every version, so you don't have to. - Several of the recipes in this repository will likely be included in future releases of Driverless AI out of the box, after more performance improvements and hardening. - #### What's in it for me if I write a recipe? - You will get better at doing data science and you will get better results. Writing code is essential to improving your data science skills. Especially when writing data science code. Recipes are perfect for that. - #### Who can make recipes? - Anyone who can or wants to. Mostly data scientists or developers. Some of the best recipes are trivial and make a big difference, like custom scorers. - #### What do I need to make a recipe? - A text editor. All you need is to create a `.py` text file containing source code. - #### How do I start? - * Examine the [references](https://github.com/h2oai/driverlessai-recipes#reference-guide) below for the API specification and architecture diagrams. - * Look at the [examples in this repository](https://github.com/h2oai/driverlessai-recipes/blob/master/README.md#sample-recipes). - * Clone this repository and make modifications to existing recipes. - * Start an experiment and upload the recipe in the expert settings of an experiment. - #### What version of Python does Driverless AI use? - Driverless AI uses Python version 3.6, so all custom recipes will run with Python 3.6 as well. - #### How do I know whether my recipe works? - Driverless AI will tell you whether it makes the cut: - * First, it is subjected to acceptance tests. If it passes, great. If not, Driverless AI provides you with feedback on how to improve it. - * Then, you can choose to include it in your experiment(s). It will decide which recipes are best suited to solve the problem. At worst, you can cause the experiment to slow down. - #### How can I debug my recipe? - * The easiest way (for now) is to keep uploading it to the expert settings in Driverless AI until the recipe is accepted. - * Another way is to do minimal changes as shown in [this debugging example](./transformers/how_to_debug_transformer.py) and use PyCharm or a similar Python debugger. - #### What happens if my recipe is rejected during upload? - * Read the entire error message, it most likely contains the stack trace and helpful information on how to fix the problem. - * If you can't figure out how to fix the recipe, we suggest you post your questions in the [Driverless AI community Slack channel](https://www.h2o.ai/community/driverless-ai-community/#chat) - * You can also send us your experiment logs zip file, which will contain the recipe source files. - #### What happens if my transformer recipe doesn't lead to the highest variable importance for the experiment? - That's nothing to worry about. It's unlikely that your features have the strongest signal of all features. Even 'magic' Kaggle grandmaster features don't usually make a massive difference, but they still beat most of the competition. - #### What happens if my recipe is not used at all by the experiment? - * Don't give up. You learned something. - * Check the logs for failures if unsure whether the recipe worked at all or not. - * Driverless AI will ignore recipe failures unless this robustness feature is specifically disabled. Under Expert Settings, disable `skip_transformer_failures` and `skip_model_failures` if you want to fail the experiment on any unexpected errors due to custom recipes. - * Inside the experiment logs zip file, there's a folder called `details` and if it contains `.stack` files with stacktraces referring to your custom code, then you know it bombed. - #### Can I write recipes in Go, C++, Java or R? - If you can hook it up to Python, then yes. We have many recipes that use Java and C++ backends. Most of Driverless AI uses C++ backends. - #### Is there a difference between a custom recipe and the recipes shipped with Driverless AI? - No. Same code base. No performance penalty. No calling overhead. Same inputs and outputs. - #### Why are some models implemented as transformers? - Separating of work. With the transformer API, we can replace *only* the particular input column(s) with out-of-fold estimates of the target column. All other columns (features) can be processed by other transformers. The combined union of all features is then passed to the model(s) which can yield higher accuracy than a model that only sees the particular input column(s). For more information about the flow of data, see the technical [references](https://github.com/h2oai/driverlessai-recipes#reference-guide) section. - #### How can I control which custom recipes are active, and how can I disable all custom recipes? - Recipes are meant to be built by people you trust and each recipe should be code-reviewed before going to production. If you don't want custom code to be executed by Driverless AI, set `enable_custom_recipes=false` in the config.toml, or add the environment variable `DRIVERLESS_AI_ENABLE_CUSTOM_RECIPES=0` at startup of Driverless AI. This will disable all custom transformers, models and scorers. If you want to keep all previously uploaded recipes enabled and disable the upload of any new recipes, set `enable_custom_recipes_upload=false` or `DRIVERLESS_AI_ENABLE_CUSTOM_RECIPES_UPLOAD=0` at startup of Driverless AI. - #### What if I keep changing the same recipe over and over? - If you upload a new version of a recipe, it will become the new default version for that recipe. Previously run experiments using older versions of that recipe will continue to work, and use the older version. New experiments will use the new version. - #### Who can see my recipe? - Everyone with access to the Driverless AI instance can run all recipes, even if they were uploaded by someone else. Recipes remains on the instance that runs Driverless AI. Experiment logs may contain relevant information about your recipes (such as their source code), so double-check before you share them. - #### How do I delete all recipes on my instance? - If you really need to delete all recipes, you can delete the `contrib` folder inside the `data_directory` (usually called `tmp`) and restart Driverless AI. Caution: Previously created experiments using custom recipes will not be able to make predictions any longer, so this is not recommended unless you also delete all related experiments as well. - #### Are MOJOs supported for experiments that use custom recipes? - In most cases (especially for complex recipes), MOJOs won’t be available out of the box. But, it is possible to get the MOJO. Contact support@h2o.ai for more information about creating MOJOs for custom recipes. (**Note**: The Python Scoring Pipeline features full support for custom recipes.) - #### How do I share my recipe with the world? - We encourage you to share your recipe in this repository. If your recipe works, please make a pull request and improve the experience for everyone! - + +### Why do I need to bring my own recipes? +Custom recipes can improve performance. Domain knowledge and intuition are essential for achieving optimal results. + +### What are some example recipes? +See the [examples in this repository](https://github.com/h2oai/driverlessai-recipes/blob/master/README.md#sample-recipes). Examples include: + +- **Transformer**: + - Split string columns into multiple numeric columns (e.g., `"A:B:10:5"` becomes `[0,1,10,5]`). + - **PyTorch** deep learning model for [text similarity analysis](https://github.com/h2oai/driverlessai-recipes/blob/master/transformers/nlp/text_embedding_similarity_transformers.py). + - **ARIMA** model for [time-series forecasting](https://github.com/h2oai/driverlessai-recipes/blob/master/transformers/timeseries/auto_arima_forecast.py). + - **Data augmentation**, like replacing zip codes with demographic info or using a [national holiday flag](https://github.com/h2oai/driverlessai-recipes/blob/master/transformers/augmentation/singapore_public_holidays.py). + +- **Model**: + - H2O-3 [algorithms](https://github.com/h2oai/driverlessai-recipes/blob/master/models/algorithms/h2o-3-models.py), including H2O AutoML. + - **Yandex [CatBoost](https://github.com/h2oai/driverlessai-recipes/blob/master/models/algorithms/catboost.py)** gradient boosting. + - Custom loss functions for [LightGBM](https://github.com/h2oai/driverlessai-recipes/blob/master/models/custom_loss/lightgbm_with_custom_loss.py) or [XGBoost](https://github.com/h2oai/driverlessai-recipes/blob/master/models/custom_loss/xgboost_with_custom_loss.py). + +- **Scorer**: + - Optimize for the [top decile](https://github.com/h2oai/driverlessai-recipes/blob/master/scorers/regression/top_decile.py) in regression tasks. + - Improve the [false discovery rate](https://github.com/h2oai/driverlessai-recipes/blob/master/scorers/classification/binary/false_discovery_rate.py) for binary classification. + +- **Explainer**: + - Create custom recipes for model interpretability, fairness, robustness, and explanations. + - Generate custom plots, charts, markdown reports, and more. + +### Is H2O Driverless AI sufficient without custom recipes? +H2O Driverless AI continues to improve with each version, so you may not need custom recipes. However, adding your own recipes can optimize performance for specific use cases. + +### What's in it for me if I write a recipe? +Writing recipes improves your data science skills and helps achieve better results. It is one of the best ways to enhance your expertise. + +### Who can write recipes? +Anyone with the necessary expertise can contribute. Data scientists and developers typically write recipes, though even simple recipes can have a significant impact. + +### What do I need to write a recipe? +A text editor and knowledge of Python. Recipes are written as `.py` files with the source code. + +### How do I start? +- Review the [API specifications](https://github.com/h2oai/driverlessai-recipes#reference-guide) and architecture diagrams. +- Review the [examples in this repository](https://github.com/h2oai/driverlessai-recipes/blob/master/README.md#sample-recipes). +- Upload your recipe in the Expert Settings section during the experiment setup. + +### What version of Python does H2O Driverless AI use? +H2O Driverless AI uses Python 3.11. Ensure your recipes are compatible with this version. + +### How do I know if my recipe works? +H2O Driverless AI will notify you whether your recipe passes the acceptance tests. If it fails, feedback will guide you on how to fix it. + +### How can I debug my recipe? +Upload your recipe to the Expert Settings and use the experiment log for debugging. Alternatively, make minimal changes as shown in [this debugging example](./transformers/how_to_debug_transformer.py) and debug with a Python debugger, like PyCharm. + +### What happens if my recipe is rejected during upload? +Review the error message, which usually includes a stack trace and hints for fixing the issue. If you need help, ask questions in the [H2O Driverless AI community Slack channel](https://www.h2o.ai/community/driverless-ai-community/#chat). You can also send your experiment logs zip file, which will contain the recipe source files. + +### What if my transformer recipe doesn't lead to the highest variable importance? +Features created by your transformer might not have the strongest signal, but they can still improve the overall model performance. + +### What happens if my recipe is not used in the experiment? +H2O Driverless AI will use the best-performing recipes. Check the experiment logs for errors related to your recipe. You can also disable recipe failures in Expert Settings. + +### Can I write recipes in Go, C++, Java, or R? +You can use any language as long as you can interface it with Python. Many recipes rely on Java and C++ backends. + +### Is there a difference between custom recipes and those shipped with H2O Driverless AI? +Custom recipes are treated the same as built-in recipes. There is no performance penalty or calling overhead. + +### Why are some models implemented as transformers? +The transformer API allows flexibility. For example, transformers can process specific input columns while leaving others unchanged, resulting in improved accuracy. + +### How can I control which custom recipes are active? How can I disable all custom recipes? +Recipes can be disabled by setting `enable_custom_recipes=false` in the `config.toml` file or using the `DRIVERLESS_AI_ENABLE_CUSTOM_RECIPES=0` environment variable. To disable uploading new recipes, set `enable_custom_recipes_upload=false` or `DRIVERLESS_AI_ENABLE_CUSTOM_RECIPES_UPLOAD=0`. + +### What if I keep changing the same recipe? +When you upload a new version of a recipe, it becomes the default. Older experiments will continue using the previous version. + +### Who can see my recipe? +Anyone with access to the H2O Driverless AI instance can run any uploaded recipe, but recipes are shared only within the instance. + +### How do I delete all recipes on my instance? +To delete all recipes, remove the `contrib` folder from the data directory (usually `tmp`) and restart H2O Driverless AI. This will prevent old experiments from making predictions unless related experiments are also deleted. + +### Are MOJOs supported for experiments that use custom recipes? +In most cases, MOJOs are not available for custom recipes. Contact support@h2o.ai for more details. + +### How do I share my recipe with the community? +Contribute to this repository by making a pull request. If your recipe works, it can help others optimize their experiments. + ## References ### Custom Transformers * sklearn API @@ -98,4 +112,4 @@ ![BYOR Architecture Diagram](reference/DriverlessAI_BYOR.png) ### Webinar Webinar: [Extending the H2O Driverless AI Platform with Your Recipes](https://www.brighttalk.com/webcast/16463/360533/extending-the-h2o-driverless-ai-platform-with-your-recipes) -Website: [H2O Driverless AI Recipes](https://www.h2o.ai/products-h2o-driverless-ai-recipes/) +Website: [H2O Driverless AI Recipes](https://www.h2o.ai/products-h2o-driverless-ai-recipes/) \ No newline at end of file diff --git a/air-gapped_installations/README.md b/air-gapped_installations/README.md index b1b76e22..1f6b7991 100644 --- a/air-gapped_installations/README.md +++ b/air-gapped_installations/README.md @@ -1,79 +1,77 @@ # Air-gapped Installation of Custom Recipes -Air gapping is a network security measure employed on one or more computers -to ensure that a secure computer network is physically isolated from unsecured networks, -such as the public Internet or an unsecured local area network. +Air gapping is a network security measure used to isolate one or more computers from unsecured networks, such as the public Internet or an unsecured local area network. This ensures that the secure computer network remains physically separated from external, potentially unsafe networks. -This documentation will guide you through the installation of Custom Recipes in such air-gapped environment. +This guide walks you through the installation process of Custom Recipes in an air-gapped environment. ## Prerequisite -- There are two DAI installations involved here. One is in an **Air-Gapped Machine** and second is in an **Internet-Facing Machine**. They will be referred this way here to avoid confusion. -- First you need to install DAI in an air gapped environment *ie. in a computer isolated from internet*. -This DAI can be installed in any Package Type (TAR SH, Docker, DEB etc.) available in https://www.h2o.ai/download/ -- While in the Internet Facing Machine, clone this repository for now and checkout the branch of your DAI `VERSION` -``` +- Two H2O Driverless AI (DAI) installations are required: one on an **Air-Gapped Machine** and the other on an **Internet-Facing Machine**. These terms will be used throughout the document for clarity. +- First, install H2O Driverless AI on the air-gapped machine (a machine isolated from the Internet). Driverless AI can be installed using any available package type (e.g., TAR SH, Docker, DEB). You can download the installation packages from [H2O.ai Downloads](https://h2o.ai/resources/download/). +- On the Internet-facing machine, clone the repository and check out the appropriate branch for your Driverless AI `VERSION`: + ``` git clone https://github.com/h2oai/driverlessai-recipes.git cd driverlessai-recipes git checkout rel-VERSION # eg. git checkout rel-1.9.0 for DAI 1.9.0 -``` + ``` -## Installation Guide -Follow along these steps to use custom recipes for DAI in an air-gapped environment: +## Installation Guide +Follow the steps below to use custom recipes for H2O Driverless AI in an air-gapped environment: -*Note: following steps need to be performed in Internet Facing Machine* +**Note**: *These steps should be performed on the Internet-Facing Machine.* -- Download the required version of Driverless AI TAR SH installer in a Internet facing machine from https://www.h2o.ai/download/ +1. Download the required version of the H2O Driverless AI TAR SH installer on the Internet-Facing Machine from [H2O.ai Downloads](https://www.h2o.ai/download/). -- Run the following commands to install the Driverless AI TAR SH. Replace `VERSION` with your specific version. +2. Run the following commands to install the H2O Driverless AI TAR SH. Replace `VERSION` with the specific version you need. + ``` + chmod 755 dai-VERSION.sh + ./dai-VERSION.sh + ``` + +3. Next, `cd` to the unpacked directory: + ``` + cd dai-VERSION + ``` -``` - chmod 755 dai-VERSION.sh - ./dai-VERSION.sh -``` -- Now cd to the unpacked directory. -``` - cd dai-VERSION -``` -- Copy the load_custom_recipe.py script from `driverlessai-recipes/air-gapped_installations` to `dai-VERSION` +4. Copy the `load_custom_recipe.py` script from `driverlessai-recipes/air-gapped_installations` to `dai-VERSION`. -- Run the following python script, either in one of the following ways: +5. Run the following Python script, in either one of the following ways: - a) To load custom recipes from a local file. + - **To load custom recipes from a local file:** ``` ./dai-env.sh python load_custom_recipe.py -username -p `` >> load_custom_recipe.log ``` - where `` is the username, e.g. jon and `` is the path to a recipe you want to upload to DAI. + - ``: The username (e.g., `jon`). + - ``: The path to the recipe you want to upload to DAI. - For example to load [daal_trees recipe](https://github.com/h2oai/driverlessai-recipes/blob/rel-1.8.8/models/algorithms/daal_trees.py) from the cloned driverlessai-recipes repo we do: - ``` - ./dai-env.sh python load_custom_recipe.py -username jon -p /home/ubuntu/driverlessai-recipes/models/algorithms/daal_trees.py >> load_custom_recipe.log - ``` + For example, to load the [daal_trees recipe](https://github.com/h2oai/driverlessai-recipes/blob/rel-1.8.8/models/algorithms/daal_trees.py) from the cloned `driverlessai-recipes` repo: + ``` + ./dai-env.sh python load_custom_recipe.py -username jon -p /home/ubuntu/driverlessai-recipes/models/algorithms/daal_trees.py >> load_custom_recipe.log + ``` - b) To load custom recipes from a URL. + - **To load custom recipes from a URL:** ``` ./dai-env.sh python load_custom_recipe.py -username -u >> load_custom_recipe.logg ``` - where `` is an http link for a url. + - ``: The URL to the custom recipe. - For example to load [catboost recipe](https://github.com/h2oai/driverlessai-recipes/blob/rel-1.8.8/models/algorithms/catboost.py) from url we do: - ``` - ./dai-env.sh python load_custom_recipe.py -username jon -u https://github.com/h2oai/driverlessai-recipes/blob/rel-1.8.8/models/algorithms/catboost.py >> load_custom_recipe.log - ``` - **Note:** you can check the `load_custom_recipe.log` file to see if the operation was successful. + For example, to load the [catboost recipe](https://github.com/h2oai/driverlessai-recipes/blob/rel-1.8.8/models/algorithms/catboost.py) from a URL: + ``` + ./dai-env.sh python load_custom_recipe.py -username jon -u https://github.com/h2oai/driverlessai-recipes/blob/rel-1.8.8/models/algorithms/catboost.py >> load_custom_recipe.log + ``` + **Note:** *You can check the `load_custom_recipe.log` file to verify if the operation was successful.* -- Once the above script was executed successfully, custom recipes and python dependencies will be installed in the - `dai-VERSION///contrib` directory, - where `` is `tmp` by default. +6. Once the script has been executed successfully, custom recipes and Python dependencies will be installed in the `dai-VERSION///contrib` directory, where `` is `tmp` by default. -- Zip the `dai-VERSION/tmp/contrib` directory and move it to the air-gapped machine and unzip there into the DAI `tmp` directory. -``` - cd dai-VERSION/``/ - zip -r user_contrib.zip ``/contrib - scp user_contrib.zip ``@``:`` -``` -- Now in the **Air-gapped Machine**, unzip the file and set permissions if necessary, e.g. -``` - cd `` - unzip user_contrib.zip - chmod -R u+rwx dai:dai ``/contrib -``` +7. Zip the `dai-VERSION/tmp/contrib` directory and move it to the air-gapped machine. Unzip it into the Driverless AI `tmp` directory: + ``` + cd dai-VERSION/``/ + zip -r user_contrib.zip ``/contrib + scp user_contrib.zip ``@``:`` + ``` + +8. Now, on the **Air-Gapped Machine**, unzip the file and set permissions if necessary: + ``` + cd + unzip user_contrib.zip + chmod -R u+rwx dai:dai /contrib + ```