From c9c9b34dd2da15a50d82a242a65a14951265a76d Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Fri, 22 Mar 2024 07:56:28 +0000 Subject: [PATCH] Updated schema docs --- docs/website/docs/general-usage/schema.md | 33 +++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/docs/website/docs/general-usage/schema.md b/docs/website/docs/general-usage/schema.md index 164814010d..7a9783cbf4 100644 --- a/docs/website/docs/general-usage/schema.md +++ b/docs/website/docs/general-usage/schema.md @@ -268,9 +268,42 @@ settings: re:^updated_at$: timestamp re:^_dlt_list_idx$: bigint ``` +### Applying data types directly with `@dlt.resource` and `apply_hints` +`dlt` offers the flexibility to directly apply data types and hints in your code, bypassing the need for importing and adjusting schemas. This approach is ideal for rapid prototyping and handling data sources with dynamic schema requirements. + +### Direct specification in `@dlt.resource` +Directly define data types and their properties, such as nullability, within the `@dlt.resource` decorator. This eliminates the dependency on external schema files. For example: + +```py +@dlt.resource(name='my_table', columns={"my_column": {"data_type": "bool", "nullable": True}}) +def my_resource(): + for i in range(10): + yield {'my_column': i % 2 == 0} +``` +This code snippet sets up a nullable boolean column named `column_name` directly in the decorator. + +#### Using `apply_hints` +When dealing with dynamically generated resources or needing to programmatically set hints, `apply_hints` is your tool. It's especially useful for applying hints across various collections or tables at once. + +For example, to apply a complex data type across all collections from a MongoDB source: + +```py +all_collections = ["collection1", "collection2", "collection3"] # replace with your actual collection names +source_data = mongodb().with_resources(*all_collections) +for col in all_collections: + source_data.resources[col].apply_hints(columns={"column_name": {"data_type": "complex"}}) +pipeline = dlt.pipeline( + pipeline_name="mongodb_pipeline", + destination="duckdb", + dataset_name="mongodb_data" +) +load_info = pipeline.run(source_data) +``` +This example iterates through MongoDB collections, applying the complex data type to a specified column, and then processes the data with `pipeline.run`. ## Export and import schema files + Please follow the guide on [how to adjust a schema](../walkthroughs/adjust-a-schema.md) to export and import `yaml` schema files in your pipeline.