Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(filesystem): implement Google Drive source #932

Merged
merged 36 commits into from
Feb 12, 2024
Merged

Conversation

IlyaFaer
Copy link
Contributor

@IlyaFaer IlyaFaer commented Feb 5, 2024

Towards: dlt-hub/verified-sources#308

I took the original implementation, moved it into the dlt repo and fixed several small problems, which were preventing the driver from working with Google Drive. Most of the original driver code is left untouched - even though I think some parts of it are not needed (file uploading, deleting, etc.)

Copy link

netlify bot commented Feb 5, 2024

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit 35ba3c7
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/65ca28c4deede200083f9cc2

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the biggest missing piece are tests:
the minimum option: we need to test glob functionality
the tests are in tests/load/filesystem

  1. add gdrive bucket to test_filesystem_dict
  2. buckets are defined in tests/load/utils
  3. bucket url is defined in tests/.dlt/config.toml, you can use https://drive.google.com/drive/u/0/folders/15eC3e5MNew2XAIefWNlG8VlEa0ISnnaG as the bucket url (contains standard_source inside)
    NOTE: if you modify DEFAULT_BUCKETS to include gdrive, all other tests (that test write files) will pick it up.

in theory they should all pass so you may try it out, to pass them may be our maximum plan (and maybe that's the easiest). otherwise you need to create another fixture...

dlt/common/configuration/specs/__init__.py Outdated Show resolved Hide resolved
dlt/common/configuration/specs/gdrive_credentials.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
Copy link
Contributor Author

@IlyaFaer IlyaFaer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, at the first gaze, there is one suspicious item about split().
Also, there is a bunch of changes to Athena, BigQuery and Snowflake.

Trying to run tests now to make sure it works.

dlt/common/storages/fsspecs/google_drive.py Outdated Show resolved Hide resolved
@IlyaFaer IlyaFaer marked this pull request as ready for review February 12, 2024 06:02
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rudolfix rudolfix merged commit bbbef3c into devel Feb 12, 2024
58 of 60 checks passed
@rudolfix rudolfix deleted the gdrive_new_source branch February 12, 2024 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants