-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AzureStorageFileSystem Directory Exists not implemented #50
Comments
Hello yes at the moment there is some feature that are not yet available. |
Hello, to keep you update regarding this issue, the long story is available here, the short one the the duck team will change the API of the duckdb FileSystem class that have a lot of impact on a lot of extensions it will take sometime but it will arrive :) |
I am getting same error trying to write a hived geoparquet to azure blob. Is this currently not possible? or Am i missing something?
|
Azure writes are not yet supported unfortunately |
@samansmink this comment and the following one on another issue made it seem like it works, that what got me confused. |
@samansmink In the mean while, I am considering using rclone to first generate the hive parquet locally and then sync it over. However we are working with many TBs worth of data that we have to keep updated. Is there any way that while writing the hive locally I can get the progress/callback as each partition is written so I can just sync that over? In theory I can just sync the entire directory structure, but with the volume of the data, I will never have the entire hive locally ( space constraints). Here's what I want to achieve.
|
@shaunakv1 the comment you link uses fsspec which is separate from the DuckDB Azure Extension and is python-only |
@samansmink I am using the same. Here's my full code and I still get the same error: import duckdb
from dotenv import load_dotenv
import os
from fsspec import filesystem
load_dotenv()
AIS_SRC_CONNECTION_STRING = os.getenv("AIS_SRC_CONNECTION_STRING")
AIS_DEST_CONNECTION_STRING = os.getenv("AIS_DEST_CONNECTION_STRING")
duckdb.register_filesystem(
filesystem("abfs", connection_string=AIS_DEST_CONNECTION_STRING)
)
con = duckdb.connect()
con.install_extension("azure")
con.load_extension("azure")
con.install_extension("spatial")
con.load_extension("spatial")
con.install_extension("h3", repository="community")
con.load_extension("h3")
### Create secret
create_secret = f"""
CREATE SECRET ais_src (
TYPE AZURE,
CONNECTION_STRING '{AIS_SRC_CONNECTION_STRING}'
);
"""
con.sql(create_secret)
### configure Duckdb performance params for azure
con.sql("SET azure_http_stats = true;")
con.sql("SET azure_read_transfer_concurrency = 8;")
con.sql("SET azure_read_transfer_chunk_size = 1_048_576;")
con.sql("SET azure_read_buffer_size = 1_048_576;")
count_query = f"""
SELECT *
FROM 'az://<redacted>/ais-2019-01-01.csv.zst'
LIMIT 10
"""
con.sql(count_query).show()
print(f"Writing to parquet...")
write_query = f"""
COPY
(
SELECT *,
ST_Point(longitude, latitude) AS geom,
year(base_date_time) AS year,
month(base_date_time) AS month
FROM read_csv('az://<redacted>/ais-2019-01-*.csv.zst', ignore_errors = true)
)
TO 'abfs://ais/parquet' (
FORMAT PARQUET,
COMPRESSION ZSTD,
ROW_GROUP_SIZE 122_880,
PARTITION_BY (year, month)
);
"""
con.sql(write_query).show() |
What happens?
duckdb.duckdb.NotImplementedException: Not implemented Error: AzureStorageFileSystem: DirectoryExists is not implemented!
Facing while copying the duckdb table to azure
To Reproduce
Just while copying the table it will produce
OS:
Ubuntu
DuckDB Version:
0.10.0
DuckDB Client:
Python
Full Name:
Tejinderpal Singh
Affiliation:
Atlan
Have you tried this on the latest nightly build?
I have not tested with any build
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
The text was updated successfully, but these errors were encountered: