Optimize queries used in Snowflake lineage connector #797

mars-lan · 2024-03-14T16:51:06Z

🤔 Why?

It takes an exceedingly long time (> 1.5 hours) to run the Snowflake lineage crawler against large Snowflake.AccountUnage.QUERY_HISTORY views.

Turns out that similar to ACCESS_HISTORY, we must also specify a filter against START_TIME in order to query QUERY_HISTORY efficiently.

🤓 What?

Specify the START_TIME filter for QUERY_HISTORY and use inner JOIN when joining with ACCESS_HISTORY in Snowflake lineage connector
Qualify the filters in the base Snowflake connector to avoid confusion.

🧪 Tested?

Verified that the MCEs before & after the changes are identical. Observed an order of magnitude improvement in performance:

Before:

Ended running with RunStatus.SUCCESS at 2024-03-14 08:54:28.949943, fetched 78 entities, took 217.1s

After:

Ended running with RunStatus.SUCCESS at 2024-03-14 08:47:54.532785, fetched 78 entities, took 27.6s

☑️ Checks

My PR contains actual code changes, and I have updated the version number in pyproject.toml.

shortcut-integration · 2024-03-14T16:51:09Z

This pull request has been linked to Shortcut Story #25106: Snowflake lineage crawler timing out due to inefficient queries.

alyiwang

LGTM

github-actions · 2024-03-14T16:57:08Z

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines	Covered	Coverage	Threshold	Status
16051	14848	93%	85%	🟢

New Files

No new covered files...

Modified Files

File	Coverage	Status
metaphor/snowflake/extractor.py	86%	🟢
metaphor/snowflake/lineage/extractor.py	68%	🟢
TOTAL	77%	🟢

updated for commit: 61d2a7a by action🐍

Optimize queries used in Snowflake lineage connector

61d2a7a

mars-lan requested review from alyiwang, elic-eon and usefulalgorithm March 14, 2024 16:51

mars-lan enabled auto-merge (squash) March 14, 2024 16:52

alyiwang approved these changes Mar 14, 2024

View reviewed changes

mars-lan merged commit 7ef524a into main Mar 14, 2024
4 checks passed

mars-lan deleted the marslan/sc-25106/snowflake-lineage-crawler-timing-out-due branch March 14, 2024 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize queries used in Snowflake lineage connector #797

Optimize queries used in Snowflake lineage connector #797

mars-lan commented Mar 14, 2024 •

edited

Loading

shortcut-integration bot commented Mar 14, 2024

alyiwang left a comment

github-actions bot commented Mar 14, 2024

Optimize queries used in Snowflake lineage connector #797

Optimize queries used in Snowflake lineage connector #797

Conversation

mars-lan commented Mar 14, 2024 • edited Loading

🤔 Why?

🤓 What?

🧪 Tested?

☑️ Checks

shortcut-integration bot commented Mar 14, 2024

alyiwang left a comment

Choose a reason for hiding this comment

github-actions bot commented Mar 14, 2024

☂️ Python Coverage

Overall Coverage

New Files

Modified Files

mars-lan commented Mar 14, 2024 •

edited

Loading