Docs: add a note to the Databricks docs on Azure (#1962)

dlt-hub · Oct 21, 2024 · d469ed4 · d469ed4
1 parent 4f58c71
commit d469ed4
Showing 1 changed file with 20 additions and 10 deletions.
diff --git a/docs/website/docs/dlt-ecosystem/destinations/databricks.md b/docs/website/docs/dlt-ecosystem/destinations/databricks.md
@@ -10,7 +10,9 @@ keywords: [Databricks, destination, data warehouse]
 *Big thanks to Evan Phillips and [swishbi.com](https://swishbi.com/) for contributing code, time, and a test environment.*
 
 ## Install dlt with Databricks
+
 **To install the dlt library with Databricks dependencies:**
+
 ```sh
 pip install "dlt[databricks]"
 ```
@@ -91,14 +93,17 @@ If you already have your Databricks workspace set up, you can skip to the [Loade
 ## Loader setup guide
 
 **1. Initialize a project with a pipeline that loads to Databricks by running**
+
 ```sh
 dlt init chess databricks
 ```
 
 **2. Install the necessary dependencies for Databricks by running**
+
 ```sh
 pip install -r requirements.txt
 ```
+
 This will install dlt with the `databricks` extra, which contains the Databricks Python dbapi client.
 
 **4. Enter your credentials into `.dlt/secrets.toml`.**
@@ -130,22 +135,22 @@ For more information on staging, see the [staging support](#staging-support) sec
 
 ## Supported file formats
 * [insert-values](../file-formats/insert-format.md) is used by default.
-* [jsonl](../file-formats/jsonl.md) supported when staging is enabled (see limitations below).
-* [parquet](../file-formats/parquet.md) supported when staging is enabled.
+* [JSONL](../file-formats/jsonl.md) supported when staging is enabled (see limitations below).
+* [Parquet](../file-formats/parquet.md) supported when staging is enabled.
 
-The `jsonl` format has some limitations when used with Databricks:
+The JSONL format has some limitations when used with Databricks:
 
 1. Compression must be disabled to load jsonl files in Databricks. Set `data_writer.disable_compression` to `true` in the dlt config when using this format.
 2. The following data types are not supported when using the JSONL format with `databricks`: `decimal`, `json`, `date`, `binary`. Use `parquet` if your data contains these types.
 3. The `bigint` data type with precision is not supported with the `jsonl` format.
 
 ## Staging support
 
-Databricks supports both Amazon S3, Azure Blob Storage and Google Cloud Storage as staging locations. `dlt` will upload files in `parquet` format to the staging location and will instruct Databricks to load data from there.
+Databricks supports both Amazon S3, Azure Blob Storage and Google Cloud Storage as staging locations. `dlt` will upload files in Parquet format to the staging location and will instruct Databricks to load data from there.
 
 ### Databricks and Amazon S3
 
-Please refer to the [S3 documentation](./filesystem.md#aws-s3) for details on connecting your S3 bucket with the bucket_url and credentials.
+Please refer to the [S3 documentation](./filesystem.md#aws-s3) for details on connecting your S3 bucket with the `bucket_url` and `credentials`.
 
 Example to set up Databricks with S3 as a staging destination:
 
@@ -165,12 +170,18 @@ pipeline = dlt.pipeline(
 
 ### Databricks and Azure Blob Storage
 
-Refer to the [Azure Blob Storage filesystem documentation](./filesystem.md#azure-blob-storage) for details on connecting your Azure Blob Storage container with the bucket_url and credentials.
+Refer to the [Azure Blob Storage filesystem documentation](./filesystem.md#azure-blob-storage) for details on connecting your Azure Blob Storage container with the `bucket_url` and `credentials`.
 
-Databricks requires that you use ABFS URLs in the following format:
-**abfss://container_name@storage_account_name.dfs.core.windows.net/path**
+To enable support for Azure Blob Storage with dlt, make sure to install the necessary dependencies by running:
 
-`dlt` is able to adapt the other representation (i.e., **az://container-name/path**), but we recommend that you use the correct form.
+```sh
+pip install "dlt[az]"
+```
+
+:::note
+Databricks requires that you use ABFS URLs in the following format: `abfss://container_name@storage_account_name.dfs.core.windows.net/path`.
+dlt is able to adapt the other representation (i.e., `az://container-name/path`), but we recommend that you use the correct form.
+:::
 
 Example to set up Databricks with Azure as a staging destination:
 
@@ -184,7 +195,6 @@ pipeline = dlt.pipeline(
     staging=dlt.destinations.filesystem('abfss://[email protected]'), # add this to activate the staging location
     dataset_name='player_data'
 )
-
 ```
 
 ### Databricks and Google Cloud Storage