first 20 pages

dlt-hub · Sep 19, 2024 · 30bd323 · 30bd323
1 parent 0515f6c
commit 30bd323
Show file tree

Hide file tree

Showing 20 changed files with 895 additions and 1,104 deletions.
diff --git a/docs/website/docs/_book-onboarding-call.md b/docs/website/docs/_book-onboarding-call.md
@@ -1 +1,2 @@
-<a href="https://calendar.app.google/EMZRS6YhM11zTGQw7">book a call</a> with a dltHub Solutions Engineer
+<a href="https://calendar.app.google/EMZRS6YhM11zTGQw7">Book a call</a> with a dltHub Solutions Engineer
+
diff --git a/docs/website/docs/build-a-pipeline-tutorial.md b/docs/website/docs/build-a-pipeline-tutorial.md
diff --git a/...site/docs/general-usage/data-enrichments/currency_conversion_data_enrichment.md b/...site/docs/general-usage/data-enrichments/currency_conversion_data_enrichment.md
@@ -7,14 +7,14 @@ keywords: [data enrichment, currency conversion, latest market rates]
 # Data enrichment part two: Currency conversion data enrichment
 
 Currency conversion data enrichment means adding additional information to currency-related data.
-Often, you have a data set of monetary value in one currency. For various reasons such as reporting,
+Often, you have a dataset of monetary value in one currency. For various reasons such as reporting,
 analysis, or global operations, it may be necessary to convert these amounts into different currencies.
 
 ## Currency conversion process
 
-Here is step-by-step process for currency conversion data enrichment:
+Here is a step-by-step process for currency conversion data enrichment:
 
-1. Define base and target currencies. e.g., USD (base) to EUR (target).
+1. Define base and target currencies, e.g., USD (base) to EUR (target).
 1. Obtain current exchange rates from a reliable source like a financial data API.
 1. Convert the monetary values at obtained exchange rates.
 1. Include metadata like conversion rate, date, and time.
@@ -35,7 +35,7 @@ create the currency conversion data enrichment pipeline.
 
 ### A. Colab notebook
 
-The Colab notebook combines three data enrichment processes for a sample dataset, it's second part
+The Colab notebook combines three data enrichment processes for a sample dataset; its second part
 contains "Data enrichment part two: Currency conversion data enrichment".
 
 Here's the link to the notebook:
@@ -59,20 +59,20 @@ currency_conversion_enrichment/
 [resources.](../../general-usage/resource.md)
 
 1. The last part of our data enrichment ([part one](../../general-usage/data-enrichments/user_agent_device_data_enrichment.md))
-   involved enriching the data with user-agent device data. This included adding two new columns to the dataset as folows:
+   involved enriching the data with user-agent device data. This included adding two new columns to the dataset as follows:
 
    - `device_price_usd`: average price of the device in USD.
 
    - `price_updated_at`: time at which the price was updated.
 
 1. The columns initially present prior to the data enrichment were:
 
-   - `user_id`: Web trackers typically assign unique ID to users for tracking their journeys and
+   - `user_id`: Web trackers typically assign a unique ID to users for tracking their journeys and
      interactions over time.
 
    - `device_name`: User device information helps in understanding the user base's device.
 
-   - `page_refer`: The referer URL is tracked to analyze traffic sources and user navigation
+   - `page_referer`: The referer URL is tracked to analyze traffic sources and user navigation
      behavior.
 
 1. Here's the resource that yields the sample data as discussed above:
@@ -106,16 +106,16 @@ This function retrieves conversion rates for currency pairs that either haven't
 or were last updated more than 24 hours ago from the ExchangeRate-API, using information stored in
 the `dlt` [state](../../general-usage/state.md).
 
-The first step is to register on [ExhangeRate-API](https://app.exchangerate-api.com/) and obtain the
+The first step is to register on [ExchangeRate-API](https://app.exchangerate-api.com/) and obtain the
 API token.
 
-1. In the `.dlt`folder, there's a file called `secrets.toml`. It's where you store sensitive
+1. In the `.dlt` folder, there's a file called `secrets.toml`. It's where you store sensitive
    information securely, like access tokens. Keep this file safe. Here's its format for service
    account authentication:
 
    ```py
    [sources]
-   api_key= "Please set me up!"  #ExchangeRate-API key
+   api_key= "Please set me up!"  # ExchangeRate-API key
    ```
 
 1. Create the `converted_amount` function as follows:
@@ -200,7 +200,7 @@ API token.
    processing.
 
    `Transformers` are a form of `dlt resource` that takes input from other resources
-   via `data_from` argument to enrich or transform the data.
+   via the `data_from` argument to enrich or transform the data.
    [Click here.](../../general-usage/resource.md#process-resources-with-dlttransformer)
 
    Conversely, `add_map` used to customize a resource applies transformations at an item level
@@ -244,7 +244,7 @@ API token.
 ### Run the pipeline
 
 1. Install necessary dependencies for the preferred
-   [destination](../../dlt-ecosystem/destinations/), For example, duckdb:
+   [destination](../../dlt-ecosystem/destinations/), for example, duckdb:
 
    ```sh
    pip install "dlt[duckdb]"
@@ -264,3 +264,4 @@ API token.
 
    For example, the "pipeline_name" for the above pipeline example is `data_enrichment_two`; you can
    use any custom name instead.
+
diff --git a/docs/website/docs/general-usage/data-enrichments/url-parser-data-enrichment.md b/docs/website/docs/general-usage/data-enrichments/url-parser-data-enrichment.md
@@ -6,28 +6,28 @@ keywords: [data enrichment, url parser, referer data enrichment]
 
 # Data enrichment part three: URL parser data enrichment
 
-URL parser data enrichment is extracting various URL components to gain additional insights and
+URL parser data enrichment involves extracting various URL components to gain additional insights and
 context about the URL. This extracted information can be used for data analysis, marketing, SEO, and
 more.
 
 ## URL parsing process
 
-Here is step-by-step process for URL parser data enrichment :
+Here is a step-by-step process for URL parser data enrichment:
 
-1. Get the URL data that is needed to be parsed from a source or create one.
-1. Send the URL data to an API like [URL Parser API](https://urlparse.com/).
-1. Get the parsed URL data.
-1. Include metadata like conversion rate, date, and time.
-1. Save the updated dataset in a data warehouse or lake using a data pipeline.
+1. Get the URL data that needs to be parsed from a source or create one.
+2. Send the URL data to an API like [URL Parser API](https://urlparse.com/).
+3. Receive the parsed URL data.
+4. Include metadata like conversion rate, date, and time.
+5. Save the updated dataset in a data warehouse or lake using a data pipeline.
 
-We use **[URL Parse API](https://urlparse.com/)** to extract the information about the URL. However,
+We use **[URL Parse API](https://urlparse.com/)** to extract information about the URL. However,
 you can use any API you prefer.
 
 :::tip
-`URL Parse API` is free, with 1000 requests/hour limit, which can be increased on request.
+`URL Parse API` is free, with a 1000 requests/hour limit, which can be increased upon request.
 :::
 
-By default the URL Parse API will return a JSON response like:
+By default, the URL Parse API will return a JSON response like:
 
 ```json
 {
@@ -51,7 +51,7 @@ By default the URL Parse API will return a JSON response like:
 }
 ```
 
-## Creating data enrichment pipeline
+## Creating a data enrichment pipeline
 
 You can either follow the example in the linked Colab notebook or follow this documentation to
 create the URL-parser data enrichment pipeline.
@@ -64,7 +64,7 @@ This Colab notebook outlines a three-part data enrichment process for a sample d
 - Currency conversion data enrichment
 - URL-parser data enrichment
 
-This document focuses on the URL-Parser Data Enrichment (Part Three). For a comprehensive
+This document focuses on the URL-parser data enrichment (Part Three). For a comprehensive
 understanding, you may explore all three enrichments sequentially in the notebook:
 [Colab Notebook](https://colab.research.google.com/drive/1ZKEkf1LRSld7CWQFS36fUXjhJKPAon7P?usp=sharing).
 
@@ -91,10 +91,10 @@ different tracking services.
 
 Let's examine a synthetic dataset created for this article. It includes:
 
-- `user_id`: Web trackers typically assign unique ID to users for tracking their journeys and
+- `user_id`: Web trackers typically assign a unique ID to users for tracking their journeys and
   interactions over time.
 
-- `device_name`: User device information helps in understanding the user base's device.
+- `device_name`: User device information helps in understanding the user base's device preferences.
 
 - `page_refer`: The referer URL is tracked to analyze traffic sources and user navigation behavior.
 
@@ -139,12 +139,11 @@ Here's the resource that yields the sample data as discussed above:
 
 ### 2. Create `url_parser` function
 
-We use a free service called [URL Parse API](https://urlparse.com/), to parse the urls. You don’t
-need to register to use this service neither get an API key.
+We use a free service called [URL Parse API](https://urlparse.com/), to parse the URLs. You don’t
+need to register to use this service nor get an API key.
 
 1. Create a `url_parser` function as follows:
    ```py
-   # @dlt.transformer(data_from=tracked_data)
    def url_parser(record):
        """
        Send a URL to a parsing service and return the parsed data.
@@ -185,10 +184,10 @@ need to register to use this service neither get an API key.
    processing.
 
    `Transformers` are a form of `dlt resource` that takes input from other resources
-   via `data_from` argument to enrich or transform the data.
+   via the `data_from` argument to enrich or transform the data.
    [Click here.](../../general-usage/resource.md#process-resources-with-dlttransformer)
 
-   Conversely, `add_map` used to customize a resource applies transformations at an item level
+   Conversely, `add_map` is used to customize a resource and applies transformations at an item level
    within a resource. It's useful for tasks like anonymizing individual data records. More on this
    can be found under [Customize resources](../../general-usage/resource.md#customize-resources) in
    the documentation.
@@ -222,13 +221,13 @@ need to register to use this service neither get an API key.
    )
    ```
 
-   This will execute the `url_parser` function with the tracked data and return parsed URL.
+   This will execute the `url_parser` function with the tracked data and return the parsed URL.
    :::
 
 ### Run the pipeline
 
 1. Install necessary dependencies for the preferred
-   [destination](../../dlt-ecosystem/destinations/), For example, duckdb:
+   [destination](../../dlt-ecosystem/destinations/), for example, duckdb:
 
    ```sh
    pip install "dlt[duckdb]"
@@ -248,3 +247,4 @@ need to register to use this service neither get an API key.
 
    For example, the "pipeline_name" for the above pipeline example is `data_enrichment_three`; you
    can use any custom name instead.
+
Original file line number	Diff line number	Diff line change
		@@ -1 +1,2 @@
		<a href="https://calendar.app.google/EMZRS6YhM11zTGQw7">book a call</a> with a dltHub Solutions Engineer
		<a href="https://calendar.app.google/EMZRS6YhM11zTGQw7">Book a call</a> with a dltHub Solutions Engineer