Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sc-27604] Parse TLL for UnityCatalog query logs #919

Merged

Conversation

usefulalgorithm
Copy link
Contributor

@usefulalgorithm usefulalgorithm commented Jul 21, 2024

🤔 Why?

UnityCatalog does not compute the sources and targets of each query log, so we need to do it ourselves.

The TLL parser is largely based on our in-house lineage parser, with some incompatible methods stripped off.

🤓 What?

  • Implemented TLL parser to find sources and targets for UC queries.
  • Look for fully qualified datasets if the found sources and targets are not fully qualified.
  • Added poetry export step in CD's docker job. This step creates the requirements.txt file, which is populated during CD and is used to build the image.
  • Modified Dockerfile to install dependencies first and then install the module. This reduces image build time drastically.

🧪 Tested?

  • Run locally
  • Tested end-to-end, took about 14 minutes to process 233k queries on andy.dev.metaphor.io
截圖 2024-07-22 上午8 50 36

☑️ Checks

  • My PR contains actual code changes, and I have updated the version number in pyproject.toml.

Copy link

github-actions bot commented Jul 21, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
19559 18336 94% 85% 🟢

New Files

File Coverage Status
metaphor/common/sql/dialect.py 100% 🟢
metaphor/common/sql/table_level_lineage/helpers/expression_handlers.py 88% 🟢
metaphor/common/sql/table_level_lineage/helpers/find_select_in_expression.py 97% 🟢
metaphor/common/sql/table_level_lineage/result.py 100% 🟢
metaphor/common/sql/table_level_lineage/table.py 100% 🟢
metaphor/common/sql/table_level_lineage/table_level_lineage.py 94% 🟢
TOTAL 96% 🟢

Modified Files

File Coverage Status
metaphor/snowflake/config.py 100% 🟢
metaphor/snowflake/extractor.py 85% 🟢
metaphor/unity_catalog/extractor.py 97% 🟢
metaphor/unity_catalog/utils.py 90% 🟢
TOTAL 93% 🟢

updated for commit: fc43048 by action🐍

Copy link

codecov bot commented Jul 21, 2024

Codecov Report

Attention: Patch coverage is 94.98069% with 13 lines in your changes missing coverage. Please review.

Project coverage is 93.74%. Comparing base (e21418d) to head (fc43048).

Files Patch % Lines
...table_level_lineage/helpers/expression_handlers.py 87.50% 5 Missing ⚠️
...mon/sql/table_level_lineage/table_level_lineage.py 93.90% 5 Missing ⚠️
...level_lineage/helpers/find_select_in_expression.py 96.87% 2 Missing ⚠️
metaphor/unity_catalog/utils.py 97.22% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #919      +/-   ##
==========================================
+ Coverage   93.55%   93.74%   +0.19%     
==========================================
  Files         216      177      -39     
  Lines       19498    19559      +61     
==========================================
+ Hits        18241    18336      +95     
+ Misses       1257     1223      -34     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@usefulalgorithm usefulalgorithm requested a review from mars-lan July 21, 2024 13:30
@mars-lan
Copy link
Contributor

Pls also measure the time it takes for max_workers=2 to simulate the typical crawler environment.

.pre-commit-config.yaml Outdated Show resolved Hide resolved
@usefulalgorithm usefulalgorithm requested a review from mars-lan July 22, 2024 04:42
.github/workflows/ci.yml Outdated Show resolved Hide resolved
@usefulalgorithm usefulalgorithm requested a review from mars-lan July 22, 2024 05:17
@usefulalgorithm usefulalgorithm enabled auto-merge (squash) July 22, 2024 05:17
@usefulalgorithm usefulalgorithm merged commit cc45ce0 into main Jul 22, 2024
6 checks passed
@usefulalgorithm usefulalgorithm deleted the tsung-julii/sc-27604/uc-query-logs-not-converted-to-parquet branch July 22, 2024 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants