Added pipeline analysis for OP Admin Dashboard #42

TeachMeTW · 2024-10-31T18:20:46Z

Add Pipeline Analysis for OP Admin Dashboard

Description

Introduces pipeline analysis notebooks for the OP Admin Dashboard. It includes two versions to comply with repository guidelines regarding data privacy and output handling:

Pipeline Analysis with Masked Outputs
- File: pipeline_analysis_with_output.ipynb
- Description: Contains masked outputs to protect sensitive information. Suitable for public viewing and aggregate analyses.
Pipeline Analysis without Outputs
- File: pipeline_analysis_no_output.ipynb
- Description: All outputs have been cleared to ensure no sensitive or individual-specific data is exposed. Suitable for individual analyses.

Changes

Added pipeline_analysis_with_output.ipynb with masked outputs.
Added pipeline_analysis_no_output.ipynb with outputs cleared.

shankari · 2024-10-31T18:50:04Z

@TeachMeTW why do you only have 8 entries?

TeachMeTW · 2024-10-31T18:54:57Z

@shankari What do you mean by 8 entries?
For me it shows:
Total documents in Stage_timeseries with metadata.key 'stats/pipeline_time': 215954

Oh I see now.. On the aggregate For August 20, 2023 (Row 1), there were 8 entries.

What is the expected count for entries?

shankari · 2024-10-31T19:00:42Z

more seriously, on cell 13, we see only 8 entries
Which makes me think that the rest of the stats (all the visualization) is based on 8 entries
Also, it is not clear what the units on the y axis are?!

TeachMeTW · 2024-10-31T19:35:52Z

@shankari I figured it out -- I accidentally put a limit on it; should be fixed, the results seem way better. I also added axis labels.

shankari · 2024-10-31T20:16:04Z

Are you sure the reading values are milliseconds? Where did you get that from?
Also, I don't believe your later graphs (e.g. cell 51) - I am 99% sure from ad-hoc investigations that there is no way that section segmentation takes more time that trip segmentation. Maybe the average is masking the outliers? No, cell 50 shows that the trip_segmentation is the new bottleneck
I would drop output_gen since it has already been removed from the pipeline so is no longer a target for optimization
e-mission/e-mission-server@fac1cb2

TeachMeTW · 2024-10-31T20:20:48Z

Are you sure the reading values are milliseconds? Where did you get that from? Also, I don't believe your later graphs - I am 99% sure from ad-hoc investigations that there is no way that section segmentation takes more time that trip segmentation.

I am suspicious of it being milliseconds as well -- when I discussed with Jack, we came to the conclusion that reading is in ms and ts is in seconds. I agree that seconds make a lot more sense from personal testing experiences

TeachMeTW · 2024-11-01T03:31:12Z

@shankari It is indeed ms; the avg is just skewed. As for the graphs, they were right. The labels were just shifted due to their orientation. Trip segmentation is indeed more time than section.

shankari · 2024-11-01T06:00:54Z

It is indeed ms

How did you verify this?
Not sure how the table from cell 95 and the table from cell 89 are related - where are the 1000+ (secs/ms) entries from cell 89 in cell 95?

TeachMeTW · 2024-11-01T10:03:20Z

It is indeed ms

How did you verify this?

Not sure how the table from cell 95 and the table from cell 89 are related - where are the 1000+ (secs/ms) entries from cell 89 in cell 95?

I reverified it by checking the pipeline intake stage and saw I was incorrect, it is seconds; the confusion stems from the fact dashboard readings was in ms.
It is aggregated and averaged; the 210k entries -> 150 unique users with the averaged values

shankari · 2024-11-01T15:19:36Z

I reverified it by checking the pipeline intake stage and saw I was incorrect, it is seconds; the confusion stems from the fact dashboard readings was in ms.

When we clean up the dashboard readings after this report (e.g. rename etc) we should also convert the readings to seconds for consistency.

For the record, why are the dashboard readings in ms instead of seconds?

Added pipeline analysis

b88a164

Fixed plots and removed limiter

c09a623

Fixed plots

5f6e9e8

Removed Info

dd2af1f

TeachMeTW force-pushed the Add-Pipeline-Analysis branch from ff10cf1 to dd2af1f Compare December 7, 2024 05:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added pipeline analysis for OP Admin Dashboard #42

Added pipeline analysis for OP Admin Dashboard #42

TeachMeTW commented Oct 31, 2024

shankari commented Oct 31, 2024

TeachMeTW commented Oct 31, 2024 •

edited

Loading

shankari commented Oct 31, 2024 •

edited

Loading

TeachMeTW commented Oct 31, 2024

shankari commented Oct 31, 2024 •

edited

Loading

TeachMeTW commented Oct 31, 2024

TeachMeTW commented Nov 1, 2024

shankari commented Nov 1, 2024 •

edited

Loading

TeachMeTW commented Nov 1, 2024

shankari commented Nov 1, 2024

Added pipeline analysis for OP Admin Dashboard #42

Are you sure you want to change the base?

Added pipeline analysis for OP Admin Dashboard #42

Conversation

TeachMeTW commented Oct 31, 2024

Add Pipeline Analysis for OP Admin Dashboard

Description

Changes

shankari commented Oct 31, 2024

TeachMeTW commented Oct 31, 2024 • edited Loading

shankari commented Oct 31, 2024 • edited Loading

TeachMeTW commented Oct 31, 2024

shankari commented Oct 31, 2024 • edited Loading

TeachMeTW commented Oct 31, 2024

TeachMeTW commented Nov 1, 2024

shankari commented Nov 1, 2024 • edited Loading

TeachMeTW commented Nov 1, 2024

shankari commented Nov 1, 2024

TeachMeTW commented Oct 31, 2024 •

edited

Loading

shankari commented Oct 31, 2024 •

edited

Loading

shankari commented Oct 31, 2024 •

edited

Loading

shankari commented Nov 1, 2024 •

edited

Loading