Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allows naming conventions to be changed #998

Merged
merged 114 commits into from
Jun 26, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
93995b9
allows to decorate async function with dlt.source
rudolfix Feb 20, 2024
4444878
adds pytest-async and updates pytest to 7.x
rudolfix Feb 20, 2024
b3b70f6
fixes forked teardown issue 7.x
rudolfix Feb 20, 2024
f5d7a0a
bumps deps for py 3.12
rudolfix Feb 21, 2024
83dc38a
adds py 12 common tests
rudolfix Feb 21, 2024
21ebfee
fixes typings after deps bump
rudolfix Feb 21, 2024
7985f9d
bumps airflow, yanks duckdb to 0.9.2
rudolfix Feb 21, 2024
07f285e
fixes tests
rudolfix Feb 21, 2024
06e441e
fixes pandas version
rudolfix Feb 21, 2024
3e846a1
adds 3.12 duckdb dep
rudolfix Feb 22, 2024
37b4a31
Merge branch 'devel' into rfix/enables-async-source
rudolfix Feb 22, 2024
934c167
Merge branch 'devel' into rfix/enables-async-source
rudolfix Feb 24, 2024
7fa574d
adds right hand pipe operator
rudolfix Feb 24, 2024
8c7942d
fixes docker ci build
rudolfix Feb 24, 2024
f951fc0
adds docs on async sources and resources
rudolfix Feb 24, 2024
387a7c7
normalizes default hints and preferred types in schema
rudolfix Feb 25, 2024
88728e1
defines pipeline state table in utils, column normalization in simple…
rudolfix Feb 25, 2024
1a53425
normalizes all identifiers used by relational normalizer, fixes other…
rudolfix Feb 25, 2024
8835023
fixes sql job client to use normalized identifiers in queries
rudolfix Feb 25, 2024
f4c504f
runs state sync tests for lower and upper case naming conventions
rudolfix Feb 25, 2024
874cc29
fixes weaviate to use normalized identifiers in queries
rudolfix Feb 25, 2024
c4e9f35
partially fixes qdrant incorrect state and version retrieval queries
rudolfix Feb 25, 2024
6345377
initial sql uppercase naming convention
rudolfix Feb 25, 2024
96a02ff
Merge branch 'devel' into rfix/allows-naming-conventions
rudolfix Mar 8, 2024
aef8cc2
adds native df readers to databricks and bigquery
rudolfix Mar 9, 2024
a53c00b
adds casing identifier capability to support different casing in nami…
rudolfix Mar 9, 2024
91f5780
cleans typing for relational normalizer
rudolfix Mar 9, 2024
5984824
renames escape functions
rudolfix Mar 18, 2024
3458441
destination capabilities for case fold and case sensitivity
rudolfix Mar 18, 2024
55362b0
drops supports naming module and allows naming to be instance in conf…
rudolfix Mar 18, 2024
b836dfe
checks all tables in information schema in one go, observes case fold…
rudolfix Mar 18, 2024
e50bfaa
moves schema verification to destination utils
rudolfix Mar 18, 2024
42d149f
adds method to remove processing hints from schema, helper functions …
rudolfix Mar 18, 2024
c53808f
accepts naming convention instances when resolving configs
rudolfix Mar 18, 2024
b97ae53
fixes the cloning of schema in decorator, removes processing hints
rudolfix Mar 18, 2024
0132c2f
removes processing hints when saving imported schema
rudolfix Mar 18, 2024
2a7c5dd
adds docs on naming conventions, removes technical docs
rudolfix Mar 18, 2024
d502c7c
Merge branch 'devel' into rfix/allows-naming-conventions
rudolfix Jun 4, 2024
3e7504b
Merge branch 'devel' into rfix/allows-naming-conventions
rudolfix Jun 6, 2024
3bb929f
adds casing info to databrick caps, makes caps an instance attr
rudolfix Jun 11, 2024
9f0920c
Merge branch 'devel' into rfix/allows-naming-conventions
rudolfix Jun 11, 2024
724dc15
adjusts destination casing in caps from schema naming and config
rudolfix Jun 11, 2024
b58a118
raises detailed schema identifier clash exceptions
rudolfix Jun 11, 2024
d190ea1
adds is_case_sensitive and name to NamingConvention
rudolfix Jun 11, 2024
b445654
adds sanity check if _dlt prefix is preserved
rudolfix Jun 11, 2024
ee8a95b
finds genric types in non generic classes deriving from generic
rudolfix Jun 11, 2024
eb30838
uses casefold INSERT VALUES job column names
rudolfix Jun 11, 2024
558db91
adds a method make_qualified_table_name_path that calculates componen…
rudolfix Jun 11, 2024
dea9669
adds casing info to destinations, caps as instance attrs, custom tabl…
rudolfix Jun 11, 2024
b1e2b09
adds naming convention to restore state tests, make them essential
rudolfix Jun 11, 2024
210be70
fixes table builder tests
rudolfix Jun 11, 2024
95b703d
removes processing hints when exporting schema to import folder, warn…
rudolfix Jun 12, 2024
4b72b77
allows to subclass INFO SCHEMA query generation and uses specialized …
rudolfix Jun 12, 2024
ab39e06
uses correct schema escaping function in sql jobs
rudolfix Jun 12, 2024
2ae3ad2
passes pipeline state to package state via extract
rudolfix Jun 12, 2024
09b7731
fixes optional normalizers module
rudolfix Jun 12, 2024
cfd3e5f
excludes version_hash from pipeline state SELECT
rudolfix Jun 12, 2024
0edbbfd
passes pipeline state to package state pt.2
rudolfix Jun 12, 2024
5769ba1
re-enables sentry tests
rudolfix Jun 12, 2024
1f17a44
bumps qdrant client, makes test running for local version
rudolfix Jun 12, 2024
71e418b
makes weaviate running
rudolfix Jun 12, 2024
ce414e1
uses schemata to find databases on athena
rudolfix Jun 13, 2024
bde61a9
uses api get_table for hidden dataset on bigquery to reflect schemas,…
rudolfix Jun 13, 2024
036e3dd
adds naming conventions to two restore state tests
rudolfix Jun 13, 2024
8546763
fixes escape identifiers to column escape
rudolfix Jun 13, 2024
f57e286
fix conflicts in docs
rudolfix Jun 13, 2024
cf50bd4
adjusts capabilities in capabilities() method, uses config and naming…
rudolfix Jun 15, 2024
72969ce
allows to add props to classes without vectorizer in weaviate
rudolfix Jun 15, 2024
656d5fc
moves caps function into factories, cleansup adapters and custom dest…
rudolfix Jun 15, 2024
bbd7fe6
sentry_dsn
rudolfix Jun 15, 2024
a671508
adds basic destination reference tests
rudolfix Jun 15, 2024
81e0db9
fixes table builder tests
rudolfix Jun 15, 2024
8a32793
fix deps and docs
rudolfix Jun 15, 2024
0dc6dc8
fixes more tests
rudolfix Jun 16, 2024
4a39795
case sensitivity docs stubs
rudolfix Jun 17, 2024
43d6d5f
fixes drop_pipeline fixture
rudolfix Jun 17, 2024
e3d998c
improves partial config generation for capabilities
rudolfix Jun 17, 2024
3aef3fd
adds snowflake csv support
rudolfix Jun 17, 2024
6df7a34
creates separate csv tests
rudolfix Jun 17, 2024
57aec2e
allows to import files into extract storage, adds import file writer …
rudolfix Jun 19, 2024
fee7af5
handles ImportFileMeta in extractor
rudolfix Jun 19, 2024
96c7222
adds import file item normalizer and router to normalize
rudolfix Jun 19, 2024
116add0
supports csv format config for snowflake
rudolfix Jun 19, 2024
42eacaf
removes realpath wherever possible and adds fast make_full_path to Fi…
rudolfix Jun 20, 2024
3793d06
adds additional methods to load_package storage to make listings faster
rudolfix Jun 20, 2024
88eec9c
adds file_format to dlt.resource, uses preferred file format for dlt …
rudolfix Jun 21, 2024
8e0f0a8
docs for importing files, file_format
rudolfix Jun 21, 2024
b1c095c
code improvements and tests
rudolfix Jun 21, 2024
46ec732
docs hard links note
rudolfix Jun 21, 2024
2194b18
Merge pull request #1479 from dlt-hub/feat/snowflake-csv-support
rudolfix Jun 21, 2024
1384ed3
Merge branch 'devel' into rfix/allows-naming-conventions
rudolfix Jun 21, 2024
b00cbb2
moves loader parallelism test to pipeliens, solves duckdb ci test err…
rudolfix Jun 23, 2024
a530345
fixes tests
rudolfix Jun 23, 2024
4271895
moves drop_pipeline fixture level up
rudolfix Jun 23, 2024
abd02df
drops default naming convention from caps so naming in saved schema p…
rudolfix Jun 24, 2024
14b4b0e
unifies all representations of pipeline state
rudolfix Jun 24, 2024
60e45b1
tries to decompress text file first in fs_client
rudolfix Jun 24, 2024
a84be2a
tests get stored state in test_job_client
rudolfix Jun 24, 2024
1dc7a09
removes credentials from dlt.attach, addes destination and staging fa…
rudolfix Jun 24, 2024
ab69b76
cleans up env variables and pipeline dropping fixutere precedence
rudolfix Jun 24, 2024
0eeb21d
Merge branch 'devel' into rfix/allows-naming-conventions
rudolfix Jun 24, 2024
f1097d8
removes dev_mode from dlt.attach
rudolfix Jun 24, 2024
3855fcc
adds missing arguments to filesystem factory
rudolfix Jun 24, 2024
651412e
fixes tests
rudolfix Jun 24, 2024
aab36e1
updates destination and naming convention docs
rudolfix Jun 25, 2024
7294aae
removes is_case_sensitive from naming convention initializer
rudolfix Jun 26, 2024
dc10473
simplifies with_file_import mark
rudolfix Jun 26, 2024
727a35e
adds case sensitivity tests
rudolfix Jun 26, 2024
4cb2646
uses dev_mode everywhere
rudolfix Jun 26, 2024
f098e5a
improves csv docs
rudolfix Jun 26, 2024
1521778
fixes encodings in fsspec
rudolfix Jun 26, 2024
796483e
improves naming convention docs
rudolfix Jun 26, 2024
534c7f8
fixes tests and renames clash to collision
rudolfix Jun 26, 2024
5f4cb4c
fixes getting original bases from instance
rudolfix Jun 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion dlt/destinations/impl/athena/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ def capabilities() -> DestinationCapabilitiesContext:
caps.preferred_staging_file_format = "parquet"
caps.supported_staging_file_formats = ["parquet", "jsonl"]
caps.escape_identifier = escape_athena_identifier
caps.case_identifier = str.lower
caps.casefold_identifier = str.lower
caps.has_case_sensitive_identifiers = False
caps.decimal_precision = (DEFAULT_NUMERIC_PRECISION, DEFAULT_NUMERIC_SCALE)
caps.wei_precision = (DEFAULT_NUMERIC_PRECISION, 0)
caps.max_identifier_length = 255
Expand Down
8 changes: 6 additions & 2 deletions dlt/destinations/impl/bigquery/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from dlt.common.data_writers.escape import escape_bigquery_identifier
from dlt.common.data_writers.escape import escape_hive_identifier
from dlt.common.destination import DestinationCapabilitiesContext
from dlt.common.arithmetics import DEFAULT_NUMERIC_PRECISION, DEFAULT_NUMERIC_SCALE

Expand All @@ -9,8 +9,11 @@ def capabilities() -> DestinationCapabilitiesContext:
caps.supported_loader_file_formats = ["jsonl", "parquet"]
caps.preferred_staging_file_format = "parquet"
caps.supported_staging_file_formats = ["parquet", "jsonl"]
caps.escape_identifier = escape_bigquery_identifier
caps.escape_identifier = escape_hive_identifier
caps.escape_literal = None
caps.has_case_sensitive_identifiers = (
True # there are case insensitive identifiers but dlt does not use them
)
caps.decimal_precision = (DEFAULT_NUMERIC_PRECISION, DEFAULT_NUMERIC_SCALE)
caps.wei_precision = (76, 38)
caps.max_identifier_length = 1024
Expand All @@ -21,5 +24,6 @@ def capabilities() -> DestinationCapabilitiesContext:
caps.is_max_text_data_type_length_in_bytes = True
caps.supports_ddl_transactions = False
caps.supports_clone_table = True
caps.schema_supports_numeric_precision = False # no precision information in BigQuery

return caps
43 changes: 10 additions & 33 deletions dlt/destinations/impl/bigquery/bigquery.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,17 +43,18 @@
from dlt.destinations.job_impl import NewReferenceJob
from dlt.destinations.sql_jobs import SqlMergeJob
from dlt.destinations.type_mapping import TypeMapper
from dlt.destinations.utils import parse_db_data_type_str_with_precision


class BigQueryTypeMapper(TypeMapper):
sct_to_unbound_dbt = {
"complex": "JSON",
"text": "STRING",
"double": "FLOAT64",
"bool": "BOOLEAN",
"bool": "BOOL",
"date": "DATE",
"timestamp": "TIMESTAMP",
"bigint": "INTEGER",
"bigint": "INT64",
"binary": "BYTES",
"wei": "BIGNUMERIC", # non-parametrized should hold wei values
"time": "TIME",
Expand All @@ -66,11 +67,11 @@ class BigQueryTypeMapper(TypeMapper):

dbt_to_sct = {
"STRING": "text",
"FLOAT": "double",
"BOOLEAN": "bool",
"FLOAT64": "double",
"BOOL": "bool",
"DATE": "date",
"TIMESTAMP": "timestamp",
"INTEGER": "bigint",
"INT64": "bigint",
"BYTES": "binary",
"NUMERIC": "decimal",
"BIGNUMERIC": "decimal",
Expand All @@ -89,9 +90,10 @@ def to_db_decimal_type(self, precision: Optional[int], scale: Optional[int]) ->
def from_db_type(
self, db_type: str, precision: Optional[int], scale: Optional[int]
) -> TColumnType:
if db_type == "BIGNUMERIC" and precision is None:
# precision is present in the type name
if db_type == "BIGNUMERIC":
return dict(data_type="wei")
return super().from_db_type(db_type, precision, scale)
return super().from_db_type(*parse_db_data_type_str_with_precision(db_type))


class BigQueryLoadJob(LoadJob, FollowupJob):
Expand Down Expand Up @@ -231,7 +233,7 @@ def start_file_load(self, table: TTableSchema, file_path: str, load_id: str) ->
reason = BigQuerySqlClient._get_reason_from_errors(gace)
if reason == "notFound":
# google.api_core.exceptions.NotFound: 404 – table not found
raise UnknownTableException(table["name"]) from gace
raise UnknownTableException(self.schema.name, table["name"]) from gace
elif (
reason == "duplicate"
): # google.api_core.exceptions.Conflict: 409 PUT – already exists
Expand Down Expand Up @@ -337,31 +339,6 @@ def _get_column_def_sql(self, column: TColumnSchema, table_format: TTableFormat
column_def_sql += " OPTIONS (rounding_mode='ROUND_HALF_AWAY_FROM_ZERO')"
return column_def_sql

def get_storage_table(self, table_name: str) -> Tuple[bool, TTableSchemaColumns]:
schema_table: TTableSchemaColumns = {}
try:
table = self.sql_client.native_connection.get_table(
self.sql_client.make_qualified_table_name(table_name, escape=False),
retry=self.sql_client._default_retry,
timeout=self.config.http_timeout,
)
partition_field = table.time_partitioning.field if table.time_partitioning else None
for c in table.schema:
schema_c: TColumnSchema = {
"name": c.name,
"nullable": c.is_nullable,
"unique": False,
"sort": False,
"primary_key": False,
"foreign_key": False,
"cluster": c.name in (table.clustering_fields or []),
"partition": c.name == partition_field,
**self._from_db_type(c.field_type, c.precision, c.scale),
}
schema_table[c.name] = schema_c
return True, schema_table
except gcp_exceptions.NotFound:
return False, schema_table

def _create_load_job(self, table: TTableSchema, file_path: str) -> bigquery.LoadJob:
# append to table for merge loads (append to stage) and regular appends.
Expand Down
4 changes: 2 additions & 2 deletions dlt/destinations/impl/bigquery/sql_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -234,8 +234,8 @@ def execute_query(self, query: AnyStr, *args: Any, **kwargs: Any) -> Iterator[DB
conn.close()

def fully_qualified_dataset_name(self, escape: bool = True) -> str:
project_id = self.capabilities.case_identifier(self.credentials.project_id)
dataset_name = self.capabilities.case_identifier(self.dataset_name)
project_id = self.capabilities.casefold_identifier(self.credentials.project_id)
dataset_name = self.capabilities.casefold_identifier(self.dataset_name)
if escape:
project_id = self.capabilities.escape_identifier(project_id)
dataset_name = self.capabilities.escape_identifier(dataset_name)
Expand Down
3 changes: 1 addition & 2 deletions dlt/destinations/impl/databricks/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
from dlt.common.data_writers.escape import escape_databricks_identifier, escape_databricks_literal
from dlt.common.arithmetics import DEFAULT_NUMERIC_PRECISION, DEFAULT_NUMERIC_SCALE

from dlt.destinations.impl.databricks.configuration import DatabricksClientConfiguration


def capabilities() -> DestinationCapabilitiesContext:
caps = DestinationCapabilitiesContext()
Expand All @@ -13,6 +11,7 @@ def capabilities() -> DestinationCapabilitiesContext:
caps.supported_staging_file_formats = ["jsonl", "parquet"]
caps.escape_identifier = escape_databricks_identifier
caps.escape_literal = escape_databricks_literal
caps.has_case_sensitive_identifiers = False
caps.decimal_precision = (DEFAULT_NUMERIC_PRECISION, DEFAULT_NUMERIC_SCALE)
caps.wei_precision = (DEFAULT_NUMERIC_PRECISION, 0)
caps.max_identifier_length = 255
Expand Down
2 changes: 1 addition & 1 deletion dlt/destinations/impl/databricks/databricks.py
Original file line number Diff line number Diff line change
Expand Up @@ -316,7 +316,7 @@ def _get_column_def_sql(self, c: TColumnSchema, table_format: TTableFormat = Non

def _get_storage_table_query_columns(self) -> List[str]:
fields = super()._get_storage_table_query_columns()
fields[1] = ( # Override because this is the only way to get data type with precision
fields[2] = ( # Override because this is the only way to get data type with precision
"full_data_type"
)
return fields
4 changes: 2 additions & 2 deletions dlt/destinations/impl/databricks/sql_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,8 +134,8 @@ def execute_query(self, query: AnyStr, *args: Any, **kwargs: Any) -> Iterator[DB
yield DatabricksCursorImpl(curr) # type: ignore[abstract]

def fully_qualified_dataset_name(self, escape: bool = True) -> str:
catalog = self.capabilities.case_identifier(self.credentials.catalog)
dataset_name = self.capabilities.case_identifier(self.dataset_name)
catalog = self.capabilities.casefold_identifier(self.credentials.catalog)
dataset_name = self.capabilities.casefold_identifier(self.dataset_name)
if escape:
catalog = self.capabilities.escape_identifier(catalog)
dataset_name = self.capabilities.escape_identifier(dataset_name)
Expand Down
1 change: 1 addition & 0 deletions dlt/destinations/impl/duckdb/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ def capabilities() -> DestinationCapabilitiesContext:
caps.supported_staging_file_formats = []
caps.escape_identifier = escape_postgres_identifier
caps.escape_literal = escape_duckdb_literal
caps.has_case_sensitive_identifiers = False
caps.decimal_precision = (DEFAULT_NUMERIC_PRECISION, DEFAULT_NUMERIC_SCALE)
caps.wei_precision = (DEFAULT_NUMERIC_PRECISION, 0)
caps.max_identifier_length = 65536
Expand Down
1 change: 1 addition & 0 deletions dlt/destinations/impl/dummy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ def capabilities() -> DestinationCapabilitiesContext:
caps.supported_loader_file_formats = additional_formats + [config.loader_file_format]
caps.preferred_staging_file_format = None
caps.supported_staging_file_formats = additional_formats + [config.loader_file_format]
caps.has_case_sensitive_identifiers = True
caps.max_identifier_length = 127
caps.max_column_identifier_length = 127
caps.max_query_length = 8 * 1024 * 1024
Expand Down
1 change: 1 addition & 0 deletions dlt/destinations/impl/motherduck/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ def capabilities() -> DestinationCapabilitiesContext:
caps.supported_loader_file_formats = ["parquet", "insert_values", "jsonl"]
caps.escape_identifier = escape_postgres_identifier
caps.escape_literal = escape_duckdb_literal
caps.has_case_sensitive_identifiers = False
caps.decimal_precision = (DEFAULT_NUMERIC_PRECISION, DEFAULT_NUMERIC_SCALE)
caps.wei_precision = (DEFAULT_NUMERIC_PRECISION, 0)
caps.max_identifier_length = 65536
Expand Down
2 changes: 1 addition & 1 deletion dlt/destinations/impl/motherduck/sql_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def __init__(self, dataset_name: str, credentials: MotherDuckCredentials) -> Non

def fully_qualified_dataset_name(self, escape: bool = True) -> str:
dataset_name = super().fully_qualified_dataset_name(escape)
database_name = self.capabilities.case_identifier(self.database_name)
database_name = self.capabilities.casefold_identifier(self.database_name)
if escape:
database_name = self.capabilities.escape_identifier(database_name)
return f"{database_name}.{dataset_name}"
4 changes: 2 additions & 2 deletions dlt/destinations/impl/mssql/sql_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,14 +95,14 @@ def drop_dataset(self) -> None:
# Drop all views
rows = self.execute_sql(
"SELECT table_name FROM information_schema.views WHERE table_schema = %s;",
self.capabilities.case_identifier(self.dataset_name),
self.capabilities.casefold_identifier(self.dataset_name),
)
view_names = [row[0] for row in rows]
self._drop_views(*view_names)
# Drop all tables
rows = self.execute_sql(
"SELECT table_name FROM information_schema.tables WHERE table_schema = %s;",
self.capabilities.case_identifier(self.dataset_name),
self.capabilities.casefold_identifier(self.dataset_name),
)
table_names = [row[0] for row in rows]
self.drop_tables(*table_names)
Expand Down
3 changes: 1 addition & 2 deletions dlt/destinations/impl/postgres/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
from dlt.common.data_writers.escape import escape_postgres_identifier, escape_postgres_literal
from dlt.common.destination import DestinationCapabilitiesContext
from dlt.common.destination.reference import JobClientBase, DestinationClientConfiguration
from dlt.common.arithmetics import DEFAULT_NUMERIC_PRECISION, DEFAULT_NUMERIC_SCALE
from dlt.common.wei import EVM_DECIMAL_PRECISION

Expand All @@ -14,7 +13,7 @@ def capabilities() -> DestinationCapabilitiesContext:
caps.supported_staging_file_formats = []
caps.escape_identifier = escape_postgres_identifier
caps.escape_literal = escape_postgres_literal
caps.case_identifier = str.lower
caps.has_case_sensitive_identifiers = True
caps.decimal_precision = (DEFAULT_NUMERIC_PRECISION, DEFAULT_NUMERIC_SCALE)
caps.wei_precision = (2 * EVM_DECIMAL_PRECISION, EVM_DECIMAL_PRECISION)
caps.max_identifier_length = 63
Expand Down
2 changes: 1 addition & 1 deletion dlt/destinations/impl/qdrant/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ def capabilities() -> DestinationCapabilitiesContext:
caps = DestinationCapabilitiesContext()
caps.preferred_loader_file_format = "jsonl"
caps.supported_loader_file_formats = ["jsonl"]

caps.has_case_sensitive_identifiers = True
caps.max_identifier_length = 200
caps.max_column_identifier_length = 1024
caps.max_query_length = 8 * 1024 * 1024
Expand Down
9 changes: 7 additions & 2 deletions dlt/destinations/impl/qdrant/qdrant_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,11 @@

from dlt.common import json, pendulum, logger
from dlt.common.schema import Schema, TTableSchema, TSchemaTables
from dlt.common.schema.utils import get_columns_names_with_prop, pipeline_state_table
from dlt.common.schema.utils import (
get_columns_names_with_prop,
normalize_table_identifiers,
pipeline_state_table,
)
from dlt.common.destination import DestinationCapabilitiesContext
from dlt.common.destination.reference import TLoadJobState, LoadJob, JobClientBase, WithStateSync
from dlt.common.storages import FileStorage
Expand Down Expand Up @@ -152,7 +156,8 @@ def __init__(self, schema: Schema, config: QdrantClientConfiguration) -> None:
)
# get definition of state table (may not be present in the schema)
state_table = schema.tables.get(
schema.state_table_name, schema.normalize_table_identifiers(pipeline_state_table())
schema.state_table_name,
normalize_table_identifiers(pipeline_state_table(), schema.naming),
)
# column names are pipeline properties
self.pipeline_state_properties = list(state_table["columns"].keys())
Expand Down
3 changes: 2 additions & 1 deletion dlt/destinations/impl/redshift/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ def capabilities() -> DestinationCapabilitiesContext:
caps.supported_staging_file_formats = ["jsonl", "parquet"]
caps.escape_identifier = escape_redshift_identifier
caps.escape_literal = escape_redshift_literal
caps.case_identifier = str.lower
caps.casefold_identifier = str.lower
caps.has_case_sensitive_identifiers = False
caps.decimal_precision = (DEFAULT_NUMERIC_PRECISION, DEFAULT_NUMERIC_SCALE)
caps.wei_precision = (DEFAULT_NUMERIC_PRECISION, 0)
caps.max_identifier_length = 127
Expand Down
3 changes: 2 additions & 1 deletion dlt/destinations/impl/snowflake/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ def capabilities() -> DestinationCapabilitiesContext:
caps.preferred_staging_file_format = "jsonl"
caps.supported_staging_file_formats = ["jsonl", "parquet"]
caps.escape_identifier = escape_snowflake_identifier
caps.case_identifier = str.upper
caps.casefold_identifier = str.upper
caps.has_case_sensitive_identifiers = True
caps.decimal_precision = (DEFAULT_NUMERIC_PRECISION, DEFAULT_NUMERIC_SCALE)
caps.wei_precision = (DEFAULT_NUMERIC_PRECISION, 0)
caps.max_identifier_length = 255
Expand Down
1 change: 1 addition & 0 deletions dlt/destinations/impl/synapse/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ def capabilities() -> DestinationCapabilitiesContext:

caps.escape_identifier = escape_postgres_identifier
caps.escape_literal = escape_mssql_literal
caps.has_case_sensitive_identifiers = False

# Synapse has a max precision of 38
# https://learn.microsoft.com/en-us/sql/t-sql/statements/create-table-azure-sql-data-warehouse?view=aps-pdw-2016-au7#DataTypes
Expand Down
2 changes: 1 addition & 1 deletion dlt/destinations/impl/weaviate/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ def capabilities() -> DestinationCapabilitiesContext:
caps = DestinationCapabilitiesContext()
caps.preferred_loader_file_format = "jsonl"
caps.supported_loader_file_formats = ["jsonl"]

caps.has_case_sensitive_identifiers = False
caps.max_identifier_length = 200
caps.max_column_identifier_length = 1024
caps.max_query_length = 8 * 1024 * 1024
Expand Down
15 changes: 12 additions & 3 deletions dlt/destinations/impl/weaviate/weaviate_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,11 @@
from dlt.common.time import ensure_pendulum_datetime
from dlt.common.schema import Schema, TTableSchema, TSchemaTables, TTableSchemaColumns
from dlt.common.schema.typing import TColumnSchema, TColumnType
from dlt.common.schema.utils import get_columns_names_with_prop, pipeline_state_table
from dlt.common.schema.utils import (
get_columns_names_with_prop,
normalize_table_identifiers,
pipeline_state_table,
)
from dlt.common.destination import DestinationCapabilitiesContext
from dlt.common.destination.reference import TLoadJobState, LoadJob, JobClientBase, WithStateSync
from dlt.common.data_types import TDataType
Expand Down Expand Up @@ -243,7 +247,8 @@ def __init__(self, schema: Schema, config: WeaviateClientConfiguration) -> None:
)
# get definition of state table (may not be present in the schema)
state_table = schema.tables.get(
schema.state_table_name, schema.normalize_table_identifiers(pipeline_state_table())
schema.state_table_name,
normalize_table_identifiers(pipeline_state_table(), schema.naming),
)
# column names are pipeline properties
self.pipeline_state_properties = list(state_table["columns"].keys())
Expand Down Expand Up @@ -453,7 +458,11 @@ def _execute_schema_update(self, only_tables: Iterable[str]) -> None:
for table_name in only_tables or self.schema.tables:
exists, existing_columns = self.get_storage_table(table_name)
# TODO: detect columns where vectorization was added or removed and modify it. currently we ignore change of hints
new_columns = self.schema.get_new_table_columns(table_name, existing_columns)
new_columns = self.schema.get_new_table_columns(
table_name,
existing_columns,
case_sensitive=self.capabilities.has_case_sensitive_identifiers,
)
logger.info(f"Found {len(new_columns)} updates for {table_name} in {self.schema.name}")
if len(new_columns) > 0:
if exists:
Expand Down
Loading