AzureStorageFileSystem Directory Exists not implemented #50

patialashahi31 · 2024-03-08T16:00:32Z

What happens?

duckdb.duckdb.NotImplementedException: Not implemented Error: AzureStorageFileSystem: DirectoryExists is not implemented!

Facing while copying the duckdb table to azure

To Reproduce

Just while copying the table it will produce

OS:

Ubuntu

DuckDB Version:

0.10.0

DuckDB Client:

Python

Full Name:

Tejinderpal Singh

Affiliation:

Atlan

Have you tried this on the latest nightly build?

I have not tested with any build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

Yes, I have

quentingodeau · 2024-03-13T07:59:41Z

Hello yes at the moment there is some feature that are not yet available.
This one for example is not implemented because we cannot yet implement it the signature of the method do not have the fileopener so we cannot access some context information that are required by the extension. I will try to see if I can make this changes.
Nevertheless the notion of directory doesn't make sens with blob storage account. It can with dfs but for blob I think it will always return false :(

quentingodeau · 2024-03-21T18:23:11Z

Hello, to keep you update regarding this issue, the long story is available here, the short one the the duck team will change the API of the duckdb FileSystem class that have a lot of impact on a lot of extensions it will take sometime but it will arrive :)

shaunakv1 · 2025-01-08T05:47:26Z

I am getting same error trying to write a hived geoparquet to azure blob. Is this currently not possible? or Am i missing something?

write_query = f"""
    COPY
        (
            SELECT *,
                    ST_Point(longitude, latitude) AS geom,
                    year(base_date_time) AS year,
                    month(base_date_time) AS month
            FROM read_csv('az://ais/ais2019/csv2/ais-2019-01-*.csv.zst', ignore_errors = true)
        )
    TO 'abfs://ais/parquet' (
            FORMAT PARQUET, 
            COMPRESSION ZSTD, 
            ROW_GROUP_SIZE 122_880, 
            PARTITION_BY (year, month)
    );
"""

samansmink · 2025-01-08T09:28:10Z

Azure writes are not yet supported unfortunately

shaunakv1 · 2025-01-09T03:09:07Z

@samansmink this comment and the following one on another issue made it seem like it works, that what got me confused.

#44 (comment)

shaunakv1 · 2025-01-09T03:14:42Z

@samansmink In the mean while, I am considering using rclone to first generate the hive parquet locally and then sync it over. However we are working with many TBs worth of data that we have to keep updated.

Is there any way that while writing the hive locally I can get the progress/callback as each partition is written so I can just sync that over? In theory I can just sync the entire directory structure, but with the volume of the data, I will never have the entire hive locally ( space constraints). Here's what I want to achieve.

Write partition ( CSVs are in glob pattern and one can generate multiple parquet files)
Sync it over
Delete it from local
Loop to next partition write

samansmink · 2025-01-09T09:34:13Z

@shaunakv1 the comment you link uses fsspec which is separate from the DuckDB Azure Extension and is python-only

shaunakv1 · 2025-01-10T03:18:55Z

@samansmink I am using the same. Here's my full code and I still get the same error:

import duckdb
from dotenv import load_dotenv
import os
from fsspec import filesystem

load_dotenv()

AIS_SRC_CONNECTION_STRING = os.getenv("AIS_SRC_CONNECTION_STRING")
AIS_DEST_CONNECTION_STRING = os.getenv("AIS_DEST_CONNECTION_STRING")

duckdb.register_filesystem(
    filesystem("abfs", connection_string=AIS_DEST_CONNECTION_STRING)
)
con = duckdb.connect()

con.install_extension("azure")
con.load_extension("azure")

con.install_extension("spatial")
con.load_extension("spatial")

con.install_extension("h3", repository="community")
con.load_extension("h3")


### Create secret
create_secret = f"""    
    CREATE SECRET ais_src (
    TYPE AZURE,
    CONNECTION_STRING '{AIS_SRC_CONNECTION_STRING}'
    );
"""
con.sql(create_secret)

### configure Duckdb performance params for azure
con.sql("SET azure_http_stats = true;")
con.sql("SET azure_read_transfer_concurrency = 8;")
con.sql("SET azure_read_transfer_chunk_size = 1_048_576;")
con.sql("SET azure_read_buffer_size = 1_048_576;")

count_query = f"""
    SELECT *
    FROM 'az://<redacted>/ais-2019-01-01.csv.zst'
    LIMIT 10
"""
con.sql(count_query).show()

print(f"Writing to parquet...")

write_query = f"""
    COPY
        (
            SELECT *,
                    ST_Point(longitude, latitude) AS geom,
                    year(base_date_time) AS year,
                    month(base_date_time) AS month
            FROM read_csv('az://<redacted>/ais-2019-01-*.csv.zst', ignore_errors = true)
        )
    TO 'abfs://ais/parquet' (
            FORMAT PARQUET, 
            COMPRESSION ZSTD, 
            ROW_GROUP_SIZE 122_880, 
            PARTITION_BY (year, month)
    );
"""

con.sql(write_query).show()

patialashahi31 changed the title ~~AzureFileSystem Directory Exists not implemented~~ AzureStorageFileSystem Directory Exists not implemented Mar 8, 2024

szarnyasg transferred this issue from duckdb/duckdb Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AzureStorageFileSystem Directory Exists not implemented #50

AzureStorageFileSystem Directory Exists not implemented #50

patialashahi31 commented Mar 8, 2024

quentingodeau commented Mar 13, 2024

quentingodeau commented Mar 21, 2024

shaunakv1 commented Jan 8, 2025

samansmink commented Jan 8, 2025

shaunakv1 commented Jan 9, 2025

shaunakv1 commented Jan 9, 2025

samansmink commented Jan 9, 2025

shaunakv1 commented Jan 10, 2025 •

edited

Loading

AzureStorageFileSystem Directory Exists not implemented #50

AzureStorageFileSystem Directory Exists not implemented #50

Comments

patialashahi31 commented Mar 8, 2024

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Full Name:

Affiliation:

Have you tried this on the latest nightly build?

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

quentingodeau commented Mar 13, 2024

quentingodeau commented Mar 21, 2024

shaunakv1 commented Jan 8, 2025

samansmink commented Jan 8, 2025

shaunakv1 commented Jan 9, 2025

shaunakv1 commented Jan 9, 2025

samansmink commented Jan 9, 2025

shaunakv1 commented Jan 10, 2025 • edited Loading

shaunakv1 commented Jan 10, 2025 •

edited

Loading