Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance _get_and_store_range Function to Include Trip Statistics and Last API Call Tracking #993

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

TeachMeTW
Copy link
Contributor

Overview

Enhances _get_and_store_range by adding total_trips, labeled_trips, and last_call fields to the user profile, enabling better insights into user activities and API usage.

Changes

  1. Extended User Profile Updates:

    • total_trips: Total number of trips recorded for the user.
    • labeled_trips: Number of labeled/annotated trips.
  2. Integrated Last API Call Tracking:

    • Accesses stats/server_api_time to retrieve timestamps of the last GET and PUT API calls.
    • Determines the latest call and stores it as last_call in the user's profile in UNIX format.

Testing

  1. Picked a user, cleared pipeline, re-ran pipeline, verified user profile was updated.

Please review and provide feedback or approval.

@TeachMeTW
Copy link
Contributor Author

TeachMeTW commented Nov 8, 2024

Now I only have one concern: the current iteration does not use UUID in the api_call query, so I’m not too sure if the last call is accurate. If I do include the UUID, the last call is NULL for that UUID.

docs_cursor = edb.get_timeseries_db().find({
    "metadata.key": "stats/server_api_time",
    "user_id" : user_id
})

Thoughts?

emission/pipeline/intake_stage.py Outdated Show resolved Hide resolved
emission/pipeline/intake_stage.py Outdated Show resolved Hide resolved
emission/pipeline/intake_stage.py Outdated Show resolved Hide resolved
emission/pipeline/intake_stage.py Outdated Show resolved Hide resolved
@TeachMeTW
Copy link
Contributor Author

@JGreenlee Comments addressed, I split the function into a different file as suggested -- though it may be overkill. Seeking thoughts

Copy link
Contributor

@JGreenlee JGreenlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good refactor! I agree that the smaller functions are overkill.
Sometimes it's just a matter of finding the balance between tidiness and complexity.

emission/analysis/result/user_stat.py Outdated Show resolved Hide resolved
emission/analysis/result/user_stat.py Outdated Show resolved Hide resolved
emission/analysis/result/user_stat.py Show resolved Hide resolved
emission/analysis/result/user_stat.py Outdated Show resolved Hide resolved
emission/analysis/result/user_stat.py Outdated Show resolved Hide resolved
@TeachMeTW
Copy link
Contributor Author

@JGreenlee Comments addressed

@TeachMeTW TeachMeTW force-pushed the Extend-Get-Store-Range branch from e4e8ed9 to 8faafe1 Compare November 9, 2024 17:33
@TeachMeTW
Copy link
Contributor Author

@JGreenlee Addressed and rebased

Copy link
Contributor

@JGreenlee JGreenlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me now, but I think we should include a unit test for this pipeline stage since it now stores a full set of stats.

I also didn't see a breakdown of manual testing. (Have you tested this locally in conjunction with e-mission/op-admin-dashboard#153 to verify that the new stats show up in the table?)

@TeachMeTW TeachMeTW force-pushed the Extend-Get-Store-Range branch from dfb5afc to f4c57f0 Compare December 4, 2024 20:43
Copy link
Contributor

@JGreenlee JGreenlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you have questions

emission/tests/analysisTests/intakeTests/TestUserStat.py Outdated Show resolved Hide resolved
emission/tests/analysisTests/intakeTests/TestUserStat.py Outdated Show resolved Hide resolved
emission/tests/analysisTests/intakeTests/TestUserStat.py Outdated Show resolved Hide resolved
emission/tests/analysisTests/intakeTests/TestUserStat.py Outdated Show resolved Hide resolved
emission/tests/analysisTests/intakeTests/TestUserStat.py Outdated Show resolved Hide resolved
self.assertIn("last_call_ts", profile, "User profile should contain 'last_call_ts'.")

expected_total_trips = -
expected_labeled_trips = -
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 if you don't load any user inputs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JGreenlee load any user inputs how so? Like loading from the example or what?

Copy link
Contributor

@JGreenlee JGreenlee Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real_examples files like shankari_2015-aug-27 only have raw sensed data; i.e. what the phone automatically picked up while traveling

For some of the days, e.g. shankari_2016-06-20 we have an additional file shankari_2016-06-20.user_inputs, which does not contain sensed data but it has "manual inputs" or "user inputs". These entries represent labels or survey responses that the user has recorded while using the app.

For a more in-depth test you could load both shankari_2016-06-20 and shankari_2016-06-20.user_inputs and expected_labeled_trips would be something other than 0

emission/tests/analysisTests/intakeTests/TestUserStat.py Outdated Show resolved Hide resolved
emission/tests/analysisTests/intakeTests/TestUserStat.py Outdated Show resolved Hide resolved
emission/tests/analysisTests/intakeTests/TestUserStat.py Outdated Show resolved Hide resolved
@TeachMeTW
Copy link
Contributor Author

@JGreenlee I realized my code was NOT pushed and you may have reviewed the wrong thing.... my bad, I will take your notes into account with the latest.

@TeachMeTW TeachMeTW force-pushed the Extend-Get-Store-Range branch 3 times, most recently from 179594b to 965cc17 Compare December 13, 2024 20:10
@TeachMeTW
Copy link
Contributor Author

@JGreenlee Comments addressed, ready for re-review

Copy link
Contributor

@JGreenlee JGreenlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

emission/tests/analysisTests/intakeTests/TestUserStat.py Outdated Show resolved Hide resolved
emission/tests/analysisTests/intakeTests/TestUserStat.py Outdated Show resolved Hide resolved
Comment on lines 51 to 55
# Retrieve the user profile
profile = edb.get_profile_db().find_one({"user_id": self.UUID})
if profile is None:
# Initialize the profile if it does not exist
edb.get_profile_db().insert_one({"user_id": self.UUID})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you take this out? Does something not work without it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I take this out, the profile is empty.

Traceback (most recent call last):
  File "/Users/rsimpson/e-mission-server/emission/tests/analysisTests/intakeTests/TestUserStat.py", line 81, in testGetAndStoreUserStats
    self.assertIsNotNone(profile, "User profile should exist after storing stats.")
AssertionError: unexpectedly None : User profile should exist after storing stats.

Meaning for some reason the entry is not being inserted. Does that hint to an issue with my additions or existing?

emission/tests/analysisTests/intakeTests/TestUserStat.py Outdated Show resolved Hide resolved
emission/tests/analysisTests/intakeTests/TestUserStat.py Outdated Show resolved Hide resolved
@TeachMeTW TeachMeTW force-pushed the Extend-Get-Store-Range branch from b01bd17 to 965cc17 Compare December 13, 2024 22:38
@TeachMeTW
Copy link
Contributor Author

@JGreenlee Addressed, see response -- in need of clarification

@TeachMeTW TeachMeTW force-pushed the Extend-Get-Store-Range branch 2 times, most recently from a7008f8 to 3ef0517 Compare December 14, 2024 04:33

TIME_FORMAT = 'YYYY-MM-DD HH:mm:ss'

def count_trips(ts: esta.TimeSeries, key_list: list, extra_query_list: Optional[list] = None) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I had an earlier comment here that was resolved but I don't think was addressed. I don't see the point of this function when it is just a wrapper around ts.find_entries_count with the exact same params and no added functionality

In the event that a suggested change doesn't work for you or you have a good reason to do otherwise, that's fine; just reply and we can discuss

Or, if you just missed my comment, make sure not to resolve conversations until you have made a change that addresses them. When I have my code reviewed I like to react with thumbs up and reply "TODO" so I can keep track of what I have seen vs what I have actually implemented and pushed up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I did address it, but I've been trying to rebase and do all this github branch magic and maybe it got reverted -- will do react and/or reply on future comments before resolving

OR I believe a better option is for the reviewer to resolve the comment? what are your thoughts on that approach in addition?

Copy link
Contributor

@JGreenlee JGreenlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TeachMeTW Code looks good but the new tests appear to be failing in GH Actions
Do they pass for you locally?

@TeachMeTW
Copy link
Contributor Author

@JGreenlee Yes the tests pass locally for me. I believe the error is not related to my code changes.
image

@JGreenlee
Copy link
Contributor

I don't know why it's passing locally and not here, but it is related to this PR because the test that's failing is the one you added. It's complaining about timestamps that look a little bit off.

@TeachMeTW
Copy link
Contributor Author

I don't know why it's passing locally and not here, but it is related to this PR because the test that's failing is the one you added. It's complaining about timestamps that look a little bit off.

Shall I add delta/assert if almost equal?

@JGreenlee
Copy link
Contributor

I don't know why it's passing locally and not here, but it is related to this PR because the test that's failing is the one you added. It's complaining about timestamps that look a little bit off.

Shall I add delta/assert if almost equal?

No we should investigate why it's different.
The pipeline is supposed to be deterministic, i.e. 100% reproducible

What happens if you checkout master and run that test data through the pipeline?
Is it the same as the CI or the same as you got on your branch?

@TeachMeTW
Copy link
Contributor Author

@JGreenlee I checkout master and run the test data through pipeline and its the same as the branch NOT the CI. I believe running ALL the tests must affect the ts. I tried to test runAllTest.sh but it crashes my console/vscode, perhaps you could verify

@JGreenlee
Copy link
Contributor

I will try to verify tomorrow. I will also pass this along in the review process for Shankari's input.

If runAllTests performs differently than running an individual test file, it is concerning as it means we do not have proper isolation between the tests. Maybe one of the tests does not tearDown properly. It may be difficult to track down the culprit.

@JGreenlee
Copy link
Contributor

If we can confirm that the issue is with one of the other tests, this looks good to me to merge.

@shankari
Copy link
Contributor

I tried to test runAllTest.sh but it crashes my console/vscode, perhaps you could verify

You should resolve the crash and then test. We already have one CI failing for "unknown reasons", having the second also fail makes the CI not be meaningful.

@TeachMeTW
Copy link
Contributor Author

TeachMeTW commented Dec 16, 2024

@shankari @JGreenlee

I discovered two things.

Crashing

The first one in regards to the crashing runAllTests.sh

When the terminal started crashing during script execution, I redirected the output to a log file to figure out what was going on. That’s when I saw the error: ValueError: I/O operation on closed file. Initially, the code opened the file without using a with statement and didn’t explicitly close it causing a different ValueError where the file was never closed. This was causing unpredictable behavior, as the file could be closed unexpectedly by the system or Python's garbage collector.

I updated the code to use a with statement to ensure the file was properly closed. However, the error persisted because there was still a redundant line after the with block trying to read the file, even though it had already been closed. Once I removed that redundant line and ensured the file was read entirely within the with block, the issue was resolved, and the terminal stopped crashing.

I could make a new issue in regards to this fix.

Timing differences

There is indeed a difference with single testing and runAll. The single test has the ts normal but the runAll brings the same error as the gitlab runner where it is off. I thought fixing the crash would fix this discrepancy but it seems deeper than that.

My hunch is that a teardown is broken somewhere and this UUID is being reused. Perhaps I can use a different uuid as a bandaid fix or can dig deeper into the root cause.

@shankari
Copy link
Contributor

My hunch is that a teardown is broken somewhere and this UUID is being reused. Perhaps I can use a different uuid as a bandaid fix or can dig deeper into the root cause.

Feel free to use either option. I will not merge until this is resolved.

@JGreenlee
Copy link
Contributor

We do not understand why the pipeline start ts comes up as 1440688707.867 during runAllTests.sh.

1440688707.867 is the ts of the first background/location on that day. The pipeline start ts is supposed to be the start_ts of the first trip (specifically the first composite_trip)

I am confident that 1440688739.672 is the correct start ts because when I run:

opcode = 'nrelop_dev-emulator-study_test'
day = 'shankari_2015-aug-27'

!./e-mission-py.bash bin/debug/purge_user.py -e $opcode
!./e-mission-py.bash bin/debug/load_timeline_for_day_and_user.py emission/tests/data/real_examples/$day $opcode
!./e-mission-py.bash bin/debug/intake_single_user.py -e $opcode

import emission.storage.timeseries.abstract_timeseries as esta
import emission.core.wrapper.user as ecwu
import pandas as pd

pd.set_option('display.float_format', str)

user_id = ecwu.User.fromEmail(opcode).uuid
ts = esta.TimeSeries.get_time_series(user_id)
ts.get_data_df("analysis/confirmed_trip")

I can see that the first trip has start_ts of 1440688739.672. This is consistent on master and this branch

@JGreenlee
Copy link
Contributor

JGreenlee commented Dec 17, 2024

This specific problem has happened once before.

I used Github's search feature with "org:e-mission 1440688707.867" and got a hit: #509 (comment)

Reading through that PR, we still cannot pinpoint what the cause was.

@TeachMeTW
Copy link
Contributor Author

TeachMeTW commented Dec 17, 2024

Commenting out all the test cases in TestTripSegmentation and replacing it with a dummy test with a print statment still invokes the same errors. Might be something with setup and teardown, might be not as I cleared everything and still the same error. Maybe its the pipeline -- not sure for now.

I also discovered TestSectionSegmentation with TestGetUserStats has a similar error result.

@TeachMeTW
Copy link
Contributor Author

TeachMeTW commented Dec 17, 2024

However, if one runs TestTripSegmentation or TestSectionSegmentation AFTER TestGetUserStats there is NO ERROR

Test User Stats as First




image

image

Test User Stats as Second



image image

@TeachMeTW
Copy link
Contributor Author

TeachMeTW commented Dec 18, 2024

Alternatively, using emission/tests/data/real_examples/shankari_2015-aug-21 rather than emission/tests/data/real_examples/shankari_2015-aug-27 resulted in NO ERRORS when running multiple tests.

image



We have a couple options.

  1. We could do above and use Aug 21 dataset for UserStats, call it a day for now.
  2. We could dig deeper as to why this is happening.

Thoughts?

@shankari
Copy link
Contributor

We should do both.
We should change the test to Aug 21 for UserStats so that we can merge this and continue to make progress.
We should then file a new issue to dig deeper into why this is happening

shankari and others added 3 commits December 17, 2024 19:55
Add per-operation timing to segment_current_trips using ect.Timer

Added total, labeled, and last call

Modified total and labeled trips to match op-admin implementation

Modified last call to mirror op-admin implementation
Addressed comments, reduced overkill on refactor

Forgot to add last_call_ts
- Introduced comprehensive test suite to validate functionality of the `get_and_store_user_stats` method.
- Covered core scenarios including:
  - Correct data aggregation and storage.
  - Handling cases with no trips or missing data.
  - Verifying behavior with partial data and multiple invocations.
  - handling of invalid UUID inputs.
- setup and teardown to ensure test isolation and clean up user data.
- Included data insertion for confirmed trips, composite trips, and server API timestamps to simulate realistic scenarios.

This initial test suite establishes a baseline for ensuring reliability of the `get_and_store_user_stats` function.

remove

Modified test

Refactored the test and simplified it; validated all new user stats

added store to intake

Updated Based on requested changes

Removed unnecessary wrapper

Changed to Aug 21 for the time being
@TeachMeTW TeachMeTW force-pushed the Extend-Get-Store-Range branch from 8022a1f to cadb13e Compare December 18, 2024 03:55
@TeachMeTW
Copy link
Contributor Author

@shankari Modified and pushed accordingly. Will make an issue as suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready for review by Shankari
Development

Successfully merging this pull request may close these issues.

3 participants