Skip to content

Commit

Permalink
fixed readme (anton 2nd review)
Browse files Browse the repository at this point in the history
  • Loading branch information
sh-rp committed May 27, 2024
1 parent c8a8376 commit 9d675b2
Showing 1 changed file with 46 additions and 40 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -16,58 +16,63 @@ We also have a cool [Google Colab example](https://colab.research.google.com/dri
:::

## Features
`dlt-init-openapi` generates code from an OpenAPI spec that you can use to extract data from a `rest_api` into any [`destination`](https://dlthub.com/docs/dlt-ecosystem/destinations/) (e.g., Postgres, BigQuery, Redshift...) that `dlt` supports. dlt-init-openapi additionally executes a set of heuristics to discover information not explicitly defined in OpenAPI specs.
`dlt-init-openapi` generates code from an OpenAPI spec that you can use to extract data from a `rest_api` into any [`destination`](../destinations/) (e.g., Postgres, BigQuery, Redshift...) that `dlt` supports. dlt-init-openapi additionally executes a set of heuristics to discover information not explicitly defined in OpenAPI specs.

Features include:

* **[Pagination](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api#pagination) discovery** for each endpoint.
* **[Pagination](../rest_api#pagination) discovery** for each endpoint.
* **Primary key discovery** for each entity.
* **Endpoint relationship mapping** into `dlt` [`transformers`](https://dlthub.com/docs/general-usage/resource#process-resources-with-dlttransformer) (e.g., /users/ -> /user/{id}).
* **Payload JSON path [data selector](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api#data-selection) discovery** for results nested in the returned JSON.
* **[Authentication](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api#authentication)** discovery for an API.
* **Endpoint relationship mapping** into `dlt` [`transformers`](../../general-usage/resource#process-resources-with-dlttransformer) (e.g., /users/ -> /user/{id}).
* **Payload JSON path [data selector](../rest_api#data-selection) discovery** for results nested in the returned JSON.
* **[Authentication](../rest_api#authentication)** discovery for an API.

## A quick example

You will need Python 3.9 or higher installed, as well as pip. You can run `pip install dlt-init-openapi` to install the current version.

We will create a simple example pipeline from a [PokeAPI spec](https://pokeapi.co/) in our repo. You can point to any other OpenAPI Spec instead if you prefer.

```sh
# 1.a. Run the generator with a URL:
$ dlt-init-openapi pokemon --url https://raw.githubusercontent.com/dlt-hub/dlt-init-openapi/devel/tests/cases/e2e_specs/pokeapi.yml --global-limit 2

# 1.b. If you have a local file, you can use the --path flag:
$ dlt-init-openapi pokemon --path ./my_specs/pokeapi.yml
1. Run the generator with a URL:
```sh
dlt-init-openapi pokemon --url https://raw.githubusercontent.com/dlt-hub/dlt-init-openapi/devel/tests/cases/e2e_specs/pokeapi.yml --global-limit 2
```

# 2. You can now pick both of the endpoints from the popup.
2. Alternatively, if you have a local file, you can use the --path flag:
```sh
dlt-init-openapi pokemon --path ./my_specs/pokeapi.yml
```

# 3. After selecting your Pokemon endpoints and hitting Enter,
# your pipeline will be rendered.
3. You can now pick both of the endpoints from the popup.

# 4. If you have any kind of authentication on your pipeline (this example does not),
# open the `.dlt/secrets.toml` and provide the credentials. You can find further
# settings in the `.dlt/config.toml`.
4. After selecting your Pokemon endpoints and hitting Enter, your pipeline will be rendered.

# 5. Go to the created pipeline folder and run your pipeline.
$ cd pokemon-pipeline
$ PROGRESS=enlighten python pipeline.py # we use enlighten for a nice progress bar :)
5. If you have any kind of authentication on your pipeline (this example does not), open the `.dlt/secrets.toml` and provide the credentials. You can find further settings in the `.dlt/config.toml`.

# 6. Print the pipeline info to the console to see what got loaded.
$ dlt pipeline pokemon_pipeline info
6. Go to the created pipeline folder and run your pipeline.
```sh
cd pokemon-pipeline
PROGRESS=enlighten python pipeline.py # we use enlighten for a nice progress bar :)
```

# 7. You can now also install Streamlit to see a preview of the data; you should
# have loaded 40 Pokemons and their details.
$ pip install pandas streamlit
$ dlt pipeline pokemon_pipeline show
7. Print the pipeline info to the console to see what got loaded.
```sh
dlt pipeline pokemon_pipeline info
```

# 8. You can go to our docs at https://dlthub.com/docs to learn how to modify
# the generated pipeline to load to many destinations, place schema contracts
# on your pipeline, and many other things.
8. You can now also install Streamlit to see a preview of the data; you should have loaded 40 Pokemons and their details.
```sh
pip install pandas streamlit
dlt pipeline pokemon_pipeline show
```

# NOTE: We used the `--global-limit 2` CLI flag to limit the requests to the PokeAPI
# for this example. This way, the Pokemon collection endpoint only gets queried
# twice, resulting in 2 x 20 Pokemon details being rendered.
```
9. You can go to our docs at https://dlthub.com/docs to learn how to modify the generated pipeline to load to many destinations, place schema contracts on your pipeline, and many other things.

:::note
We used the `--global-limit 2` CLI flag to limit the requests to the PokeAPI
for this example. This way, the Pokemon collection endpoint only gets queried
twice, resulting in 2 x 20 Pokemon details being rendered.
:::

## What will be created?

Expand All @@ -94,7 +99,7 @@ If you re-generate your pipeline, you will be prompted to continue if this folde

## A closer look at your `rest_api` dictionary in `pokemon/__init__.py`

This file contains the configuration dictionary for the [dlt rest_api](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api) source which is the main result of running this generator. For our Pokemon example, we have used an OpenAPI 3 spec that works out of the box. The result of this dictionary depends on the quality of the spec you are using, whether the API you are querying actually adheres to this spec, and whether our heuristics manage to find the right values. You can edit this file to adapt the behavior of the dlt rest_api accordingly. Please read our [dlt rest_api](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api) docs to learn how to do this and play with our detailed [Google Colab example](https://colab.research.google.com/drive/1MRZvguOTZj1MlkEGzjiso8lQ_wr1MJRI?usp=sharing#scrollTo=LHGxzf1Ev_yr).
This file contains the [configuration dictionary](./rest_api#source-configuration) for the rest_api source which is the main result of running this generator. For our Pokemon example, we have used an OpenAPI 3 spec that works out of the box. The result of this dictionary depends on the quality of the spec you are using, whether the API you are querying actually adheres to this spec, and whether our heuristics manage to find the right values. You can edit this file to adapt the behavior of the dlt rest_api accordingly. Please read our [dlt rest_api](../rest_api) docs to learn how to configure the rest_api source with our detailed [Google Colab example](https://colab.research.google.com/drive/1MRZvguOTZj1MlkEGzjiso8lQ_wr1MJRI?usp=sharing#scrollTo=LHGxzf1Ev_yr).

The generated dictionary will look something like this:

Expand Down Expand Up @@ -149,9 +154,12 @@ The generated dictionary will look something like this:
## CLI command

```sh
$ dlt-init-openapi <source_name> [OPTIONS]
# example:
$ dlt-init-openapi pokemon --path ./path/to/my_spec.yml --no-interactive --output-path ./my_pipeline
dlt-init-openapi <source_name> [OPTIONS]
```

### Example:
```sh
dlt-init-openapi pokemon --path ./path/to/my_spec.yml --no-interactive --output-path ./my_pipeline
```

**Options**:
Expand All @@ -166,9 +174,7 @@ _The only required options are either to supply a path or a URL to a spec_
- `--log-level`: Set the logging level for stdout output, defaults to 20 (INFO).
- `--global-limit`: Set a global limit on the generated source.
- `--update-rest-api-source`: Update the locally cached rest_api verified source.
- `--allow-openapi-2`: Allows the use of OpenAPI v2. specs. Migration of the spec to 3.0 is recommended

for better results though.
- `--allow-openapi-2`: Allows the use of OpenAPI v2. specs. Migration of the spec to 3.0 is recommended for better results though.
- `--version`: Show the installed version of the generator and exit.
- `--help`: Show this message and exit.

Expand All @@ -189,7 +195,7 @@ $ dlt-init-openapi pokemon --url ... --config config.yml
```

## Telemetry
We track your usage of this tool similar to how we track other commands in the dlt core library. Read more about this and how to disable it here: https://dlthub.com/docs/reference/telemetry.
We track your usage of this tool similar to how we track other commands in the dlt core library. Read more about this and how to disable it [here](../../reference/telemetry).

## Prior work
This project started as a fork of [openapi-python-client](https://github.com/openapi-generators/openapi-python-client). Pretty much all parts are heavily changed or completely replaced, but some lines of code still exist, and we like to acknowledge the many good ideas we got from the original project :)
Expand Down

0 comments on commit 9d675b2

Please sign in to comment.