diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/openapi-generator.md b/docs/website/docs/dlt-ecosystem/verified-sources/openapi-generator.md index b04ad5b8f0..c02a83bae2 100644 --- a/docs/website/docs/dlt-ecosystem/verified-sources/openapi-generator.md +++ b/docs/website/docs/dlt-ecosystem/verified-sources/openapi-generator.md @@ -16,15 +16,15 @@ We also have a cool [Google Colab example](https://colab.research.google.com/dri ::: ## Features -`dlt-init-openapi` generates code from an OpenAPI spec that you can use to extract data from a `rest_api` into any [`destination`](https://dlthub.com/docs/dlt-ecosystem/destinations/) (e.g., Postgres, BigQuery, Redshift...) that `dlt` supports. dlt-init-openapi additionally executes a set of heuristics to discover information not explicitly defined in OpenAPI specs. +`dlt-init-openapi` generates code from an OpenAPI spec that you can use to extract data from a `rest_api` into any [`destination`](../destinations/) (e.g., Postgres, BigQuery, Redshift...) that `dlt` supports. dlt-init-openapi additionally executes a set of heuristics to discover information not explicitly defined in OpenAPI specs. Features include: -* **[Pagination](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api#pagination) discovery** for each endpoint. +* **[Pagination](../rest_api#pagination) discovery** for each endpoint. * **Primary key discovery** for each entity. -* **Endpoint relationship mapping** into `dlt` [`transformers`](https://dlthub.com/docs/general-usage/resource#process-resources-with-dlttransformer) (e.g., /users/ -> /user/{id}). -* **Payload JSON path [data selector](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api#data-selection) discovery** for results nested in the returned JSON. -* **[Authentication](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api#authentication)** discovery for an API. +* **Endpoint relationship mapping** into `dlt` [`transformers`](../../general-usage/resource#process-resources-with-dlttransformer) (e.g., /users/ -> /user/{id}). +* **Payload JSON path [data selector](../rest_api#data-selection) discovery** for results nested in the returned JSON. +* **[Authentication](../rest_api#authentication)** discovery for an API. ## A quick example @@ -32,42 +32,47 @@ You will need Python 3.9 or higher installed, as well as pip. You can run `pip i We will create a simple example pipeline from a [PokeAPI spec](https://pokeapi.co/) in our repo. You can point to any other OpenAPI Spec instead if you prefer. -```sh -# 1.a. Run the generator with a URL: -$ dlt-init-openapi pokemon --url https://raw.githubusercontent.com/dlt-hub/dlt-init-openapi/devel/tests/cases/e2e_specs/pokeapi.yml --global-limit 2 -# 1.b. If you have a local file, you can use the --path flag: -$ dlt-init-openapi pokemon --path ./my_specs/pokeapi.yml +1. Run the generator with a URL: + ```sh + dlt-init-openapi pokemon --url https://raw.githubusercontent.com/dlt-hub/dlt-init-openapi/devel/tests/cases/e2e_specs/pokeapi.yml --global-limit 2 + ``` -# 2. You can now pick both of the endpoints from the popup. +2. Alternatively, if you have a local file, you can use the --path flag: + ```sh + dlt-init-openapi pokemon --path ./my_specs/pokeapi.yml + ``` -# 3. After selecting your Pokemon endpoints and hitting Enter, -# your pipeline will be rendered. +3. You can now pick both of the endpoints from the popup. -# 4. If you have any kind of authentication on your pipeline (this example does not), -# open the `.dlt/secrets.toml` and provide the credentials. You can find further -# settings in the `.dlt/config.toml`. +4. After selecting your Pokemon endpoints and hitting Enter, your pipeline will be rendered. -# 5. Go to the created pipeline folder and run your pipeline. -$ cd pokemon-pipeline -$ PROGRESS=enlighten python pipeline.py # we use enlighten for a nice progress bar :) +5. If you have any kind of authentication on your pipeline (this example does not), open the `.dlt/secrets.toml` and provide the credentials. You can find further settings in the `.dlt/config.toml`. -# 6. Print the pipeline info to the console to see what got loaded. -$ dlt pipeline pokemon_pipeline info +6. Go to the created pipeline folder and run your pipeline. + ```sh + cd pokemon-pipeline + PROGRESS=enlighten python pipeline.py # we use enlighten for a nice progress bar :) + ``` -# 7. You can now also install Streamlit to see a preview of the data; you should -# have loaded 40 Pokemons and their details. -$ pip install pandas streamlit -$ dlt pipeline pokemon_pipeline show +7. Print the pipeline info to the console to see what got loaded. + ```sh + dlt pipeline pokemon_pipeline info + ``` -# 8. You can go to our docs at https://dlthub.com/docs to learn how to modify -# the generated pipeline to load to many destinations, place schema contracts -# on your pipeline, and many other things. +8. You can now also install Streamlit to see a preview of the data; you should have loaded 40 Pokemons and their details. + ```sh + pip install pandas streamlit + dlt pipeline pokemon_pipeline show + ``` -# NOTE: We used the `--global-limit 2` CLI flag to limit the requests to the PokeAPI -# for this example. This way, the Pokemon collection endpoint only gets queried -# twice, resulting in 2 x 20 Pokemon details being rendered. -``` +9. You can go to our docs at https://dlthub.com/docs to learn how to modify the generated pipeline to load to many destinations, place schema contracts on your pipeline, and many other things. + +:::note +We used the `--global-limit 2` CLI flag to limit the requests to the PokeAPI +for this example. This way, the Pokemon collection endpoint only gets queried +twice, resulting in 2 x 20 Pokemon details being rendered. +::: ## What will be created? @@ -94,7 +99,7 @@ If you re-generate your pipeline, you will be prompted to continue if this folde ## A closer look at your `rest_api` dictionary in `pokemon/__init__.py` -This file contains the configuration dictionary for the [dlt rest_api](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api) source which is the main result of running this generator. For our Pokemon example, we have used an OpenAPI 3 spec that works out of the box. The result of this dictionary depends on the quality of the spec you are using, whether the API you are querying actually adheres to this spec, and whether our heuristics manage to find the right values. You can edit this file to adapt the behavior of the dlt rest_api accordingly. Please read our [dlt rest_api](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api) docs to learn how to do this and play with our detailed [Google Colab example](https://colab.research.google.com/drive/1MRZvguOTZj1MlkEGzjiso8lQ_wr1MJRI?usp=sharing#scrollTo=LHGxzf1Ev_yr). +This file contains the [configuration dictionary](./rest_api#source-configuration) for the rest_api source which is the main result of running this generator. For our Pokemon example, we have used an OpenAPI 3 spec that works out of the box. The result of this dictionary depends on the quality of the spec you are using, whether the API you are querying actually adheres to this spec, and whether our heuristics manage to find the right values. You can edit this file to adapt the behavior of the dlt rest_api accordingly. Please read our [dlt rest_api](../rest_api) docs to learn how to configure the rest_api source with our detailed [Google Colab example](https://colab.research.google.com/drive/1MRZvguOTZj1MlkEGzjiso8lQ_wr1MJRI?usp=sharing#scrollTo=LHGxzf1Ev_yr). The generated dictionary will look something like this: @@ -149,9 +154,12 @@ The generated dictionary will look something like this: ## CLI command ```sh -$ dlt-init-openapi [OPTIONS] -# example: -$ dlt-init-openapi pokemon --path ./path/to/my_spec.yml --no-interactive --output-path ./my_pipeline +dlt-init-openapi [OPTIONS] +``` + +### Example: +```sh +dlt-init-openapi pokemon --path ./path/to/my_spec.yml --no-interactive --output-path ./my_pipeline ``` **Options**: @@ -166,9 +174,7 @@ _The only required options are either to supply a path or a URL to a spec_ - `--log-level`: Set the logging level for stdout output, defaults to 20 (INFO). - `--global-limit`: Set a global limit on the generated source. - `--update-rest-api-source`: Update the locally cached rest_api verified source. -- `--allow-openapi-2`: Allows the use of OpenAPI v2. specs. Migration of the spec to 3.0 is recommended - - for better results though. +- `--allow-openapi-2`: Allows the use of OpenAPI v2. specs. Migration of the spec to 3.0 is recommended for better results though. - `--version`: Show the installed version of the generator and exit. - `--help`: Show this message and exit. @@ -189,7 +195,7 @@ $ dlt-init-openapi pokemon --url ... --config config.yml ``` ## Telemetry -We track your usage of this tool similar to how we track other commands in the dlt core library. Read more about this and how to disable it here: https://dlthub.com/docs/reference/telemetry. +We track your usage of this tool similar to how we track other commands in the dlt core library. Read more about this and how to disable it [here](../../reference/telemetry). ## Prior work This project started as a fork of [openapi-python-client](https://github.com/openapi-generators/openapi-python-client). Pretty much all parts are heavily changed or completely replaced, but some lines of code still exist, and we like to acknowledge the many good ideas we got from the original project :)