Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spectrum / Athena Support #22

Open
norton120 opened this issue Apr 27, 2019 · 1 comment
Open

Spectrum / Athena Support #22

norton120 opened this issue Apr 27, 2019 · 1 comment
Labels
enhancement New feature or request

Comments

@norton120
Copy link
Contributor

Description

During the publish phase we have everything we need to create an external schema in Redshift / register the meta for Athena. Since we know this is in AWS, this would be a hugely powerful addition to current functionality

Pseudocode

parq.register(target="Redshift")

Why?

By registering a schema at publish, this makes the written data immediately queryable via any SQL workbench tool. We should standardize that the external schema is everything in the path leading up to the dataset name, and the table is the dataset name. So for a path
s3://bananabucket/this/is/a/prefix/dataset/id=123/name=steve/asf809dg8jkljsd12.parquet
the external schema to register would be bananabucket_this_is_a_prefix and the table would be
dataset. So querying it via Spectrum / Athena would be
SELECT * FROM bananabucket_this_is_a_prefix.dataset WHERE id > 122 ... WOAH.

@norton120 norton120 added the enhancement New feature or request label Apr 27, 2019
RyanAdalbert pushed a commit to RyanAdalbert/s3parq that referenced this issue Sep 16, 2019
@arogozhnikov
Copy link

arogozhnikov commented Jan 31, 2022

Hi @norton120 , I'm much insterested in this feature. Did you find any python-focused solutions for management of parquets that can register meta in AWS Athena?

Update: nevermind, seems awswrangler can deal with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants