Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

master merge for 1.5.0 release #2158

Merged
merged 23 commits into from
Dec 17, 2024
Merged

master merge for 1.5.0 release #2158

merged 23 commits into from
Dec 17, 2024

Conversation

rudolfix
Copy link
Collaborator

Description

master merge for 1.5.0 release

trymzet and others added 23 commits December 4, 2024 17:35
* Add open/closed range arguments for incremental

* Docs for incremental range args

* Docstring

* Typo

* Ensure deduplication is disabled when range_start=='open'

* Cache transformer settings
* add ibis dataset in own class for now

* make error clearer

* fix some linting and fix broken test

* make most destinations work with selecting the right db and catalog, transpiling sql via postgres in some cases and selecting the right dialect in others

* add missing motherduck and sqlalchemy mappings

* casefold identifiers for ibis wrapper calss

* re-organize existing dataset code to prepare ibis relation integration

* integrate ibis relation into existing code

* re-order tests

* fall back to default dataset if table not in schema

* make dataset type selectable

* add dataset type selection test and fix bug in tests

* update docs for ibis expressions use

* ensure a bunch of ibis operations continue working

* add some more tests and typings

* fix typing (with brute force get_attr typing..)

* move ibis to dependency group

* move ibis stuff to helpers

* post devel merge, put in change from dataset, update lockfile

* add ibis to sqlalchemy tests

* improve docs a bit

* fix ibis dep group

* fix dataset snippets

* fix ibis version

* add support for column schema in certion query cases

---------

Co-authored-by: Marcin Rudolf <[email protected]>
* add pyiceberg dependency and upgrade mypy

- mypy upgrade needed to solve this issue: apache/iceberg-python#768
- uses <1.13.0 requirement on mypy because 1.13.0 gives error
- new lint errors arising due to version upgrade are simply ignored

* extend pyiceberg dependencies

* remove redundant delta annotation

* add basic local filesystem iceberg support

* add active table format setting

* disable merge tests for iceberg table format

* restore non-redundant extra info

* refactor to in-memory iceberg catalog

* add s3 support for iceberg table format

* add schema evolution support for iceberg table format

* extract _register_table function

* add partition support for iceberg table format

* update docstring

* enable child table test for iceberg table format

* enable empty source test for iceberg table format

* make iceberg catalog namespace configurable and default to dataset name

* add optional typing

* fix typo

* improve typing

* extract logic into dedicated function

* add iceberg read support to filesystem sql client

* remove unused import

* add todo

* extract logic into separate functions

* add azure support for iceberg table format

* generalize delta table format tests

* enable get tables function test for iceberg table format

* remove ignores

* undo table directory management change

* enable test_read_interfaces tests for iceberg

* fix active table format filter

* use mixin for object store rs credentials

* generalize catalog typing

* extract pyiceberg scheme mapping into separate function

* generalize credentials mixin test setup

* remove unused import

* add centralized fallback to append when merge is not supported

* Revert "add centralized fallback to append when merge is not supported"

This reverts commit 54cd0bc.

* fall back to append if merge is not supported on filesystem

* fix test for s3-compatible storage

* remove obsolete code path

* exclude gcs read interface tests for iceberg

* add gcs support for iceberg table format

* switch to UnsupportedAuthenticationMethodException

* add iceberg table format docs

* use shorter pipeline name to prevent too long sql identifiers

* add iceberg catalog note to docs

* black format

* use shorter pipeline name to prevent too long sql identifiers

* correct max id length for sqlalchemy mysql dialect

* Revert "use shorter pipeline name to prevent too long sql identifiers"

This reverts commit 6cce03b.

* Revert "use shorter pipeline name to prevent too long sql identifiers"

This reverts commit ef29aa7.

* replace show with execute to prevent useless print output

* add abfss scheme to test

* remove az support for iceberg table format

* remove iceberg bucket test exclusion

* add note to docs on azure scheme support for iceberg table format

* exclude iceberg from duckdb s3-compatibility test

* disable pyiceberg info logs for tests

* extend table format docs and move into own page

* upgrade adlfs to enable account_host attribute

* Merge branch 'devel' of https://github.com/dlt-hub/dlt into feat/1996-iceberg-filesystem

* fix lint errors

* re-add pyiceberg dependency

* enabled iceberg in dbt-duckdb

* upgrade pyiceberg version

* remove pyiceberg mypy errors across python version

* does not install airflow group for dev

* fixes gcp oauth iceberg credentials handling

* fixes ca cert bundle duckdb azure on ci

* allow for airflow dep to be present during type check

---------

Co-authored-by: Marcin Rudolf <[email protected]>
* explicitly adding docs for destination item size control

* alena's feedback

* revised for explicit note

* Update docs/website/docs/reference/performance.md

---------

Co-authored-by: hulmanaseer00 <[email protected]>
Co-authored-by: Alena Astrakhantseva <[email protected]>
* add databricks oauth authentication

* improve auth databricks test

* force token-based auth for azure external location tests
* make duckdb handle iceberg table with nested types

* replace duckdb views for iceberg tables

* remove unnecessary context closing and opening

* replace duckdb views for abfss protocol

* restore original destination for write path

* use dev_mode to work around leftover data from previous tests

leftover data caused by #2148
* drops tables from schema and relational

* documents custom sections for sql_database and source rename

* clones schema without data tables when resources without source are extacted, adds tests

* skips airflow tests if not installed

* adds doc on setting up FUSE on bucket

* adds doc on setting up FUSE on bucket

* adds row key propagation for table when its nested table require it

* fixes tests
* remove standalone dataset from exports

* make pipeline dataset factory public

* rework transformation section

* fix some linting errors

* add row counts feature for readabledataset

* add dataset access example to getting started scripts

* add notes about row_counts special query to datasets docs

* fix internal docusaurus links

* Update docs/website/docs/intro.md

* Update docs/website/docs/tutorial/load-data-from-an-api.md

* Update docs/website/docs/tutorial/load-data-from-an-api.md

* Update docs/website/docs/tutorial/load-data-from-an-api.md

* Update docs/website/docs/general-usage/dataset-access/dataset.md

* Update docs/website/docs/general-usage/dataset-access/dataset.md

* Update docs/website/docs/dlt-ecosystem/transformations/index.md

* Update docs/website/docs/dlt-ecosystem/transformations/index.md

* Update docs/website/docs/dlt-ecosystem/transformations/index.md

* Update docs/website/docs/dlt-ecosystem/transformations/index.md

* Update docs/website/docs/dlt-ecosystem/destinations/duckdb.md

* Update docs/website/docs/dlt-ecosystem/transformations/index.md

* Update docs/website/docs/dlt-ecosystem/transformations/index.md

* Update docs/website/docs/dlt-ecosystem/transformations/python.md

* Update docs/website/docs/dlt-ecosystem/transformations/python.md

* Update docs/website/docs/dlt-ecosystem/transformations/python.md

* Update docs/website/docs/dlt-ecosystem/transformations/python.md

* Update docs/website/docs/dlt-ecosystem/transformations/python.md

* Update docs/website/docs/dlt-ecosystem/transformations/python.md

* Update docs/website/docs/dlt-ecosystem/transformations/python.md

* Update docs/website/docs/dlt-ecosystem/transformations/python.md

* Update docs/website/docs/dlt-ecosystem/transformations/sql.md

* Update docs/website/docs/dlt-ecosystem/transformations/sql.md

* Update docs/website/docs/dlt-ecosystem/transformations/sql.md

* Update docs/website/docs/dlt-ecosystem/transformations/sql.md

* Update docs/website/docs/dlt-ecosystem/transformations/sql.md

* Update docs/website/docs/general-usage/dataset-access/dataset.md

---------

Co-authored-by: Alena Astrakhantseva <[email protected]>
* try to fix ibis az problems on linux

* remove duckdb certs fix

* test explicitely setting transport options

* sets the ssl curl on a correct connection clone

---------

Co-authored-by: Marcin Rudolf <[email protected]>
* allows data type diff and ensures valid migration separately

* removes dlt init flag to skip core sources, adds flag to eject core source
* convert add_limit to step based limiting

* prevent late arriving items to be forwarded from limit
add some convenience methods for pipe step management

* added a few more tests for limit

* add more limit functions from branch

* remove rate-limiting

* fix limiting bug and update docs

* revert back to inserting validator step at the same position if replaced

* make time limit tests more lenient for mac os tests

* tmp

* add test for testing incremental with limit

* improve limit tests with parallelized case

* add backfill example with sql_database

* fix linting

* remove extra file

* only wrap iterators on demand

* move items transform steps into extra file
* first draft

* updates

* add image

* add bottom list
Copy link

netlify bot commented Dec 17, 2024

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit 38d0dab
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/676193d36bed4600080c712b

@rudolfix rudolfix merged commit e8c5e9b into master Dec 17, 2024
57 of 59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants