Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 correct tracking of producer views #3854

Open
wants to merge 41 commits into
base: master
Choose a base branch
from

Conversation

lucasrodes
Copy link
Member

@lucasrodes lucasrodes commented Jan 16, 2025

Fixes #3855

Producer analytics does not accurately count the views of data from a given producer. It might be over estimating these.

This PR drops the usage of VersionTracker for regular queries to our database. There are two benefits:

  • Numbers will be more accurate
  • Performance is better. Loading the steps dataframe is significantly slower.

@lucasrodes lucasrodes changed the base branch from master to enhance-producer-analytics-layout January 16, 2025 17:18
@owidbot
Copy link
Contributor

owidbot commented Jan 16, 2025

Quick links (staging server):

Site Dev Site Preview Admin Wizard Docs

Login: ssh owid@staging-site-producer-analytics-bug-views

chart-diff: ✅ No charts for review.
data-diff: ✅ No differences found
Legend: +New  ~Modified  -Removed  =Identical  Details
Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet

Automatically updated datasets matching weekly_wildfires|excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk are not included

Edited: 2025-01-16 17:21:06 UTC
Execution time: 14.56 seconds

@lucasrodes lucasrodes marked this pull request as ready for review January 16, 2025 17:27
@lucasrodes
Copy link
Member Author

lucasrodes commented Jan 16, 2025

Update: the bug might be coming from VersionTracker.steps_df. In particular, from the list "all_chart_slugs".

Example:

df = VersionTracker(exclude_steps=[]).steps_df
df.loc[df.step.str.contains("grapher/gcp/2024-11-21/global_carbon_budget"), "all_chart_slugs"].item()

@lucasrodes
Copy link
Member Author

On my computer, using regular queries to DB improves performance significantly.

Old code (with VersionTracker): Took ~8 seconds.
New code (regular queries): Takes ~1 second.

Base automatically changed from enhance-producer-analytics-layout to master January 16, 2025 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants