-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
master merge for 1.0.0 release #1816
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* feat: add timezone flag to configure timestamp data * fix: delete timezone init * test: add duckdb timestamps with timezone * test: fix resource hints for timestamp * test: correct duckdb timestamps * test: timezone tests for parquet files * exp: add notebook with timestamp exploration * test: refactor timestamp tests * test: simplified tests and extended experiments * exp: timestamp exp for duckdb and parquet * fix: add pyarrow reflection for timezone flag * fix lint errors * fix: CI/CD move tests pyarrow module * fix: pyarrow timezone defaults true * refactor: typemapper signatures * fix: duckdb timestamp config * docs: updated duckdb.md timestamps * fix: revert duckdb timestamp defaults * fix: restore duckdb timestamp default * fix: duckdb timestamp mapper * fix: delete notebook * docs: added timestamp and timezone section * refactor: duckdb precision exception message * feat: postgres timestamp timezone config * fix: postgres timestamp precision * fix: postgres timezone false case * feat: add snowflake timezone and precision flag * test: postgres invalid timestamp precision * test: unified timestamp invalid precision * test: unified column flag timezone * chore: add warn log for unsupported timezone or precision flag * docs: timezone and precision flags for timestamps * fix: none case error * docs: add duckdb default precision * fix: typing errors * rebase: formatted files from upstream devel * fix: warning message and reference TODO * test: delete duplicated input_data array * docs: moved timestamp config to data types section * fix: lint and format * fix: lint local errors
… with cursor_path missing or None value (#1576) * allows specification of what happens on cursor_path missing or cursor_path having the value None: raise differentiated exceptions, exclude row, or include row. * Documents handling None values at the incremental cursor * fixes incremental extract crashing if one record has cursor_path = None * test that add_map can be used to transform items before the incremental function is called * Unifies treating of None values for python Objects (including pydantic), pandas, and arrow --------- Co-authored-by: Marcin Rudolf <[email protected]>
* - Change default vector column name to "vector" to conform with lancedb standard - Add search tests with tantivy as search engine Signed-off-by: Marcel Coetzee <[email protected]> * Format and fix linting Signed-off-by: Marcel Coetzee <[email protected]> * Add custom embedding function registration test Signed-off-by: Marcel Coetzee <[email protected]> * Spawn process in test to make sure registry can be deserialized from arrow files Signed-off-by: Marcel Coetzee <[email protected]> * Simplify null string handling Signed-off-by: Marcel Coetzee <[email protected]> * Change NULL string replacement with random string, doc clarification Signed-off-by: Marcel Coetzee <[email protected]> * Update default vector column name in docs Signed-off-by: Marcel Coetzee <[email protected]> --------- Signed-off-by: Marcel Coetzee <[email protected]>
(cherry picked from commit 071135b)
* Update deploy-with-dagster.md * Update deploy-with-dagster.md * Update deploy-with-dagster.md * Update deploy-with-dagster.md * Update deploy-with-dagster.md * Update deploy-with-dagster.md * Update deploy-with-dagster.md * Update deploy-with-dagster.md * Update deploy-with-dagster.md * small improvements * fix layout * added faqs * fix code blocks * fixing linting error * minor fixes * Update deploy-with-dagster.md * Update docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md Co-authored-by: Alena Astrakhantseva <[email protected]> * Update docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md Co-authored-by: Alena Astrakhantseva <[email protected]> * Update docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md Co-authored-by: Alena Astrakhantseva <[email protected]> * Update docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md Co-authored-by: Alena Astrakhantseva <[email protected]> * Update deploy-with-dagster.md * Update deploy-with-dagster.md --------- Co-authored-by: Alena <[email protected]>
* updted the documentation * Updated
the `initial_value` is a parameter of `dlt.sources.incremental`.
* small improvements * Updated lancedb title
…t nullable (#1791) * regression test & fix for arrow table with non-nullable cursor column * regression test for arrow Table and arrow RecordBatch * formats code
* copies rest_api source code and test suite, adjusts imports * integrates rest_client/conftest.pi into rest_api/conftest.py. Fixes incompatibilities except for POST request (/search/posts) * integrates POST search test * do no longer skip test with typed dict config * reuses tests/sources/helpers/rest_client/conftest.py in tests/sources/rest_api * checks off TODO * formats rest_api code according to dlt-core rules * fixes typing errors and graphlib import error * moves latest changes from rest_api into core (687e7ddab3a95fa621584741af543e561147ebe3). Formats and lints entire rest API starts to reorganize test suite * modularizes rest_api test suite * formats code and imports * updates signature of Paginator.update_state() * moves source test suite after duckdb is installed * end-to-end test rest_api_source on all destinations. Removes redundant helpers from test/utils.py * adds example rest_api_pipeline.py, corrects sample rest_api_pipeline docs on secrets * loads latest 30 days of issues instead of fixed date * refactors types * tests example rest_api pipelines, adds filesystem configs to load tests * fix inheritance of incremental args, make typed_dict detection work with typing extensions dicts * type incremental cursor_path as str * refactors intersection of TResourceHints and ResourceBase into TResourceHintsBase * uses str instead of generic TCursorValue * configures github access token for CI * copies sql source and tests * adjusts import paths * workaround for UUID type missing in sqlalchemy < 2.0 * extracts load tests to tests/load. Adds necessary test utility functions * formats code * corrects example postgres credentials for the test suite * formats imports, removes duplicate definition * conditionally skips test for range type detection * fixes side effects of tests modifying os.environ. * fixes lint errors * moves tests to right places, runs on all destinations where applicable moves filesystem source with tests and examples rearranges old sources.filesystem adds copy sig for transformers fixes Windows tests moves source test suite after duckdb is installed Revert "attempt to make duckdb a minimal dependency by removing it from extras" This reverts commit 6b7e670. attempt to make duckdb a minimal dependency by removing it from extras formats code updates signature of Paginator.update_state() formats imports modularizes rest_api test suite adds new files from 687e7ddab3a95fa621584741af543e561147ebe3, starts to reorganize test suite moves latest changes from rest_api into core (687e7ddab3a95fa621584741af543e561147ebe3). Formats and lints entire rest API fixes last type errors fixes more type errors and formats code fixes graphlib import error fixes more type errors fixes type errors except for test_configurations.py fixes typing errors where optional field was required formats rest_api code according to dlt-core rules checks off TODO reuses tests/sources/helpers/rest_client/conftest.py in tests/sources/rest_api do no longer skip test with typed dict config integrates POST search test integrates rest_client/conftest.pi into rest_api/conftest.py. Fixes incompatibilities except for POST request (/search/posts) copies rest_api source code and test suite, adjusts imports * post rebase fixes and formatting * first simple version of init command that can use core sources * update tests for core sources * improve tests a bit more * move init / generic source to core * detect explicit repo url in init command * update output and clean up structure in init command a bit * fix tests * add option for omitting core sources and reverting to the old behavior * add core sources to the dlt init -l list * add init template files to build * remove one unneded file * revert common tests file * move sources tests to dedicated file * remove destination tests for now, revert later * upgrade sqlalchemy for local source tests * create sql_database extra * fix bug in transform * set up timezone fixtures properly, still does not work right * fallback to timezone on duckdb with timestamp * separate common from load tests properly * update duckdb timezone test * add sql_alchemy dependency to last part of common tests * updates imports * add sql_database_pipeline file, update dlt init commands, add basic tests for sql_database_pipeline * only import sqlalchemy in tests if present * fix linter errors * bump connectorx for python 3.12 support * move sql_alchemy shims to shims file and use the original file for the same dependency system as with other libs * Fix linter errors (reverts back to wilis version from a few commits ago) * exclude connectorx from python 3.8 * make rest api example pipeline also work without a token * remove secrets from local sources tests * change test setup to work with both sqlalchemy versions * adds secrets to a part of common tests * make sql database pipeline tests succeed on both sqlalchemy versions * add excel dependenices to common tests * fix bug in schema inference of sql_alchemy backed sources * fix tests running for sql alchemy 1.4 * add concept of single file templates in the core * update tests and fix some * add some example pipelines * fixes some issues * sort source names * fix unsupported columns * fix all sql database tests for sqlalchemy 2.0 * fix some tests for sqlalchemy 1.4 * deselect connectorx incremental tests on sqlalchemy 1.4 * fixes some more tests * some cleanup * fix bug in init script * Revert "remove destination tests for now, revert later" This reverts commit 47e1933. * exclude sources load tests from destination workflows * fix openpyxl install * disable requests tests for now * fix commen tests * add dataframe example pipeline clean up other examples a bit * add intro examples * update cleaning scripts for athena and redshift * make timezone tests slightly more strict * reorders sql_database import to get user friendly dependency error --------- Co-authored-by: dave <[email protected]> Co-authored-by: Marcin Rudolf <[email protected]>
* concats tables and record batches before being written to control row group size * flushes the item buffer for empty tables * Update dlt/common/data_writers/buffered.py Co-authored-by: Willi Müller <[email protected]> * Update docs/website/docs/dlt-ecosystem/file-formats/parquet.md Co-authored-by: Willi Müller <[email protected]> * Update docs/website/docs/dlt-ecosystem/file-formats/parquet.md Co-authored-by: Willi Müller <[email protected]> * Update docs/website/docs/dlt-ecosystem/file-formats/parquet.md Co-authored-by: Willi Müller <[email protected]> * Update docs/website/docs/dlt-ecosystem/file-formats/parquet.md Co-authored-by: Willi Müller <[email protected]> * refactors writers and buffered code, improves docs --------- Co-authored-by: Willi Müller <[email protected]>
* adds methods to detect nested and root tables via parent hint * skips linking in relational when no parent hint, removes linking skip for primary keys * moves schema config and normalizer importers to schema module, braks cyclic deps with dest capabilities * adds table_format override to pipeline run * resolves merge strategy using adapter, uses default for a destination if strategy not explicit * removes force_iceberg flag from athena, requires explicit table_format * adds PreparedTableSchema to indicate TTableSchemas that are prepared for loading, makes verify_schema explicit method to be called by load, simplifies methods to prepare tables * applies table and file format to run methods in all pipeline tests * shortens temp table names in sql jobs * adds filesystem to drop command tests * fixes tests * adds method to update table from diff into extract * athena iceberg does not create dlt pipeline state as iceberg by default * other test fixes * deprecates force_icebergs, adds hive table format to opt out * merges column props and hints, categorizes column props * moves type mappers into destination capabilities * fixes tests * fixes cap data types verification errors not being raised * adds missing deps * fixes more tests * allows precision and scale to be 0 * fixes more tests * corrects connectorx for 3.12
…n jobs (#1781) * defaults `raise_on_failed_jobs = True`. Adapts test_dummy_client.py * updates docs on terminal exceptions on failed jobs * undoes change of test assertion, changes test setup instead * removes calls to raise_on_failed_jobs() in docs * Enables setting of raise_on_failed_jobs in airflow_helper, removes fail_task_if_any_job_failed * removes setting of os.environ["LOAD__RAISE_ON_FAILED_JOBS"] = "true" and calls to raise_on_failed_jobs() * Removes redundant calls to raise_on_failed_jobs() in entire test suite. Refactors tests where necessary. * fixes default arg overwriting config value in load of Pipeline * fixes some test cases that started to abort * requests errors set to transient for databrics * fixes even more tests --------- Co-authored-by: Marcin Rudolf <[email protected]>
* adds fallback to complex variant column if it exists * adds mogrations for comples data type and preferred dt * renames complex in docs * renames complex * fixes bug with dynamic columns in make_hints * adds v10 schema engine fixture * finalizes comples -> json rename, adds more tests * adds row_key and parent_key, drops foreign_key, adds migrations and updates test schemas * test fixes * deprecates skip_complex_types Pydantic config, updates trace contract
Move sources and destinations to the top level
* structural and content changes to the sql_database doc * fixing language in code snippets * fixing broken link * updating content + structure based on feedback * fixing formatting * fixing code formatting * fixing indentation * modifying based on comments and splitting into multiple pages * updating broken links * removing problematic relative paths * small formatting and language change + adding a line about column reflection * fix outdated info * fix description --------- Co-authored-by: akelad <[email protected]>
* adding the sql_database tutorial * fixing language snippets * fixing broken link * grammar and formatting fixes * Update docs/website/docs/tutorial/sql_database.md Co-authored-by: mariarice15 <[email protected]> * Update docs/website/docs/tutorial/sql_database.md * Update docs/website/docs/tutorial/sql_database.md * Update docs/website/docs/tutorial/sql_database.md * Update docs/website/docs/tutorial/sql_database.md * Apply suggestions from code review Co-authored-by: mariarice15 <[email protected]> * Update docs/website/docs/tutorial/sql_database.md * Update docs/website/docs/tutorial/sql_database.md * Update docs/website/docs/tutorial/sql_database.md * Update docs/website/docs/tutorial/sql_database.md --------- Co-authored-by: Anton Burnashev <[email protected]> Co-authored-by: mariarice15 <[email protected]>
* skips tables without jobs when creating table chain jobs, deletes delta table and arrow dataset instances * adds tests for tables without jobs * fixes merge key and primary key OR clause for clickhouse
…as list (#1535) * creates a single source in extract for all resource instances passed as a list * decomposes dicts of resources so names are split accross many sources
…g hints (#1806) * Add autodetect schema with hints test for BigQuery table builder Signed-off-by: Marcel Coetzee <[email protected]> * Use SDK to set hints for autodetect_schema path Signed-off-by: Marcel Coetzee <[email protected]> * Pass timestamp test Signed-off-by: Marcel Coetzee <[email protected]> * Remove redundant test Signed-off-by: Marcel Coetzee <[email protected]> * Extract BigQuery load job configuration into own method Signed-off-by: Marcel Coetzee <[email protected]> * moves pipeline tests to pipelines --------- Signed-off-by: Marcel Coetzee <[email protected]> Co-authored-by: Marcin Rudolf <[email protected]>
* Implement sqlalchemy loader Begin implementing sqlalchemy loader SQLA load job, factory, schema storage, POC sqlalchemy tests attempt Implement SqlJobClient interface Parquet load, some tests running on mysql update lockfile Limit bulk insert chunk size, sqlite create/drop schema, fixes Generate schema update Get more tests running with mysql More tests passing Fix state, schema restore * Support destination name in tests * Some job client/sql client tests running on sqlite * Fix more tests * ALl sqlite tests passing * Add sqlalchemy tests in ci * Type errors * Test sqlalchemy in own workflow * Fix tests, type errors * Fix config * CI fix * Add alembic to handle ALTER TABLE * FIx workflow * Install mysqlclient in venv * Mysql service version * Single fail * mysql healtcheck * No localhost * Remove weaviate * Change ubuntu version * Debug sqlite version * Revert * Use py datetime in tests * Test on sqlalchemy 1.4 and 2 remove secrets toml remove secrets toml Revert "remove secrets toml" This reverts commit 7dd189c. Fix default pipeline name test * Lint, no cli tests * Update lockfile * Fix test, complex -> json * Refactor type mapper * Update tests destination config * Fix tests * Ignore sources tests * Fix overriding destination in test pipeline * Fix time precision in arrow test * Lint * Fix destination setup in test * Fix * Use nullpool, lazy create engine, close current connection
Co-authored-by: Akela Drissner-Schmid <[email protected]>
* chore: add paramiko dev dependency * test: add container for sftp localhost * chore: add tmp bash scripts * exp: sftp client with fsspec * chore: sftp timestamp metadata discovered * fix: docs lint * feat: add fsspec protocol sftp * fix: lint errors from devel * test: sftp server localhost * fix: filesystem SFTP docker-compose tests * fix: json import * chore: clean tests and dockerfile * refactor: ci test exec for sftp server * feat: sftp file url parser * test: sftp reading using file samples * chore: extended SFTP credentials class * docs: filesystem SFTP credentials and authentication * chore: add bobby password protected key-based authentication * docs: sftp correction for ssh-agent * chore: add docker volume * chore: revert ci changes * test: refactor sftp with auth methods * test: sftp skip test when agent not configured * fix: poetry lock * fix: github workflow * fix: run only sftp tests * fix: merge conflict regression * fix: ssh-agent for tests * fix: pytest executions excluding sftp * fix: CI test execution * test: sftp login with signed certificate * fix: poetry lock regenerated * refactor: filesystem sftp tests * fix: filesystem tests for sftp * refactor: reduce redundancy * fix: lint and remove duplicated test * chore: change ubuntu version * fix: enforce test marker * fix: ignore sftp tests * fix: exclude sftp from filesystem tests * adds sftp extra dep --------- Co-authored-by: Marcin Rudolf <[email protected]>
* Move sources and destinations to the top level * Update the css * Update sidebars.js * Adjust icons --------- Co-authored-by: Violetta Mishechkina <[email protected]> Co-authored-by: akelad <[email protected]> Co-authored-by: Anton Burnashev <[email protected]>
* Masks secrets in traces. * tests that secrets are masked in stringified trace * generates secrets in deployments from dlt.secrets provider instead of pipeline trace * corrects masking and looks up secret value in dlt.secrets * removes secret masking and replaces credentials with None. * fixes deploy help when deploy type missing * fixes always_choose restore defaults in echo * tests deploy command with and without secrets * fixes dumping secret vals for toml in deploy --------- Co-authored-by: Marcin Rudolf <[email protected]>
* removes blog files * updates schema docs for nested references * updates docs to use nested instead of parent child * adds more migration tests * bumps to 1.0.0 * adds scd2 tests
✅ Deploy Preview for dlt-hub-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
rudolfix
force-pushed
the
devel
branch
2 times, most recently
from
September 16, 2024 13:12
346d7c9
to
2ee3eab
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
master merge for 1.0.0 release