-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modeling and functions #829
Modeling and functions #829
Conversation
@corinne-hcr can you clarify that this is the most recent version of the code? Note that this is different from #826 |
Yes, the files in this PR are copied from the private data repo. They are what I have and am working on now. |
Note that I plan to significantly refactor this code as part of my cleanup. I would suggest importing directly from this repo instead of making a copy, because you will not be able to simply copy your changes over. |
…on_pipeline, modify a parameter in the plot code
- Replace the participant filter and the direct call to `edb` with the standard call to the abstract timeseries - Change the imports to the standard format with standard imports first, and our imports later
You are reading all the user trips twice while determining the parameters. |
I use |
@corinne-hcr please do not make any additional changes. I am refactoring the code much more substantially, and if you push changes as well, I am going to get a bunch of merge conflicts. I should have taken over the codebase a lot earlier, but I waited for you to push all your changes so that we would not be working on the same codebase at the same time. At this point, I'm going to have to pull an all-nighter to clean up this code, so please don't make my life harder. |
…cr/e-mission-server into modeling_and_functions
@corinne-hcr Please work on the unit tests instead so I can know that my refactoring is correct. |
The previous implementation was in a situation where we didn't know for sure whether or not there would be UUID entries for each user. But that has been true by default for years now. If we ever got into a situation in which we didn't have the uuid_db, we would just create the missing entries. Switching to the UUID DB avoids weird UUIDs like `None` and `null`, and also is significantly faster. ``` %timeit edb.get_uuid_db().distinct("uuid") 1.29 ms ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` ``` %timeit edb.get_timeseries_db().distinct("user_id") 3.68 s ± 105 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ```
- Enable logging - Add logging for each stage of the pipeline - change all `for i in range(len(x))` to `for y in x` or `for i, y in enumerate(x)` as needed - Change the logging in `map_labels_mode` to log all the rows instead of one row at a time, to be consistent with the log message
+ enable logging + replace `count` with `uniqcount` so it doesn't conflict with the count() function https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.count.html Without this fix, we run into: #2 (comment)
Instead of running on only the first one
This required some changes to the intake pipeline code: - add a new algorithm type - add a new non-placeholder function that calls the actual load prediction code This also required changes to the load_predict code to match the intake pipeline: - accept a single trip instead of an array as input - change the trip access functions accordingly - enable logging + add additional log statements to know what we are doing - convert the `print` statements in the ad-hoc unit testing to assert statements so we could actually see if they failed - improve exception handling by catching and printing the exception Testing done: - Ran the intake pipeline; it passed - Connected a UI client from master, checked that there were no errors
At the current point, everything seems to work end-to-end for the minipilot dataset. |
We have a cold start problem with demos - we want to be able to showcase the labeling, but it won't work until we have sufficient labels. This commit introduces a new placeholder that will help with demos. It pulls the unique set of existing labels and randomly returns one of them 9/10th of the time. The remaining 1/10th of the time, it returns no inference. This lets us prototype our goal, where most of the trips are labeled, and the user only occasionally has to input labels. However, if the user does have sufficiently good labels, we do want to use that. So we use our built-in ensemble method to generate two labels, pick the "real", non-placeholder if if exists, and fallback to the placeholder if it doesn't. Note that if we enhanced the placeholder to use the actual distribution of the existing labels, this could arguably be a very simple randomized algorithm. + added a super simple script to reset only the label inference stage and the only the confirmed trip stage so that we can experiment with this without having to re-run the complete pipeline + added a super simple script to print out label stats - how many of each kind of trip, to debug pipeline operation Testing done: - Reset the inference pipeline - Ran the intake pipeline Placeholder was invoked ``` 2021-07-20 10:54:54,758:DEBUG:4457897408:In placeholder_predictor_demo: ound 85 for user 1d292b85-c549-409a-a10d-746e957582a0, returning value {'mode_confirm': 'shared_ride', 'purpose_confirm': 'meal', 'replaced_mode': 'same_mode'} 2021-07-20 10:54:54,866:DEBUG:4457897408:In placeholder_predictor_demo: ound 85 for user 1d292b85-c549-409a-a10d-746e957582a0, returning value {'mode_confirm': 'walk', 'purpose_confirm': 'exercise', 'replaced_mode': 'bus'} 2021-07-20 10:54:54,972:DEBUG:4457897408:In placeholder_predictor_demo: ound 85 for user 1d292b85-c549-409a-a10d-746e957582a0, returning value {'mode_confirm': 'skateboard', 'purpose_confirm': 'exercise', 'replaced_mode': 'same_mode'} ``` Final result was a combination of real and placeholder ``` 2021-07-20 10:54:55,182:DEBUG:4457897408:Found real prediction Labelprediction({'trip_id': ObjectId('60f70def49a6d6de8e8ef803'), 'algorithm_id': 4, 'prediction': [{'labels': {'mode_confirm': 'not_a_trip', 'purpose_confirm': 'home', 'replaced_mode': 'not_a_trip'}, 'p': 1.0}], 'start_ts': 1612036947.149, 'end_ts': 1612037455.46}), using that preferentially 2021-07-20 10:54:55,286:DEBUG:4457897408:No real prediction found, using placeholder prediction Labelprediction({'trip_id': ObjectId('60f70def49a6d6de8e8ef807'), 'algorithm_id': 5, 'prediction': [{'labels': {'mode_confirm': 'walk', 'replaced_mode': 'same_mode', 'purpose_confirm': 'meal'}, 'p': 0.9517281073260278}], 'start_ts': 1612132424.3783095, 'end_ts': 1612133524.871}) ``` - Reset confirmed trips ``` all inferred trips 215 all confirmed trips 0 confirmed trips with inferred labels 0 confirmed trips without inferred labels 0 confirmed trips with expectation 0 ``` - Reran the pipeline ``` all confirmed trips 215 confirmed trips with inferred labels 215 confirmed trips without inferred labels 0 confirmed trips with expectation 215 ```
Before fix: #2 (comment) After fix: - Before reset ``` Inferred trips with inferences 215 All expected trips 430 Expected trips with inferences 430 all inferred trips 215 all confirmed trips 215 confirmed trips with inferred labels 215 confirmed trips without inferred labels 0 confirmed trips with expectation 215 ``` - After reset ``` $ ./e-mission-py.bash bin/debug/reset_partial_label_testing.py -c storage not configured, falling back to sample, default configuration Connecting to database URL localhost {'n': 430, 'ok': 1.0} {'n': 215, 'ok': 1.0} {'n': 1, 'ok': 1.0} storage not configured, falling back to sample, default configuration Connecting to database URL localhost All inferred trips 215 Inferred trips with inferences 215 All expected trips 0 Expected trips with inferences 0 ... confirmed trips with inferred labels 0 confirmed trips without inferred labels 0 confirmed trips with expectation 0 ``` - After re-run ``` All expected trips 215 Expected trips with inferences 215 all inferred trips 215 all confirmed trips 215 confirmed trips with inferred labels 215 confirmed trips without inferred labels 0 confirmed trips with expectation 215 ```
+ Add the ability to select an ensemble function like primary_algorithms does for normal inference algorithms
+ Prediction algorithms now go in inferrers.py + Ensemble algorithms now go in ensembles.py
+ Just needed to update the fully qualified names of the predictors - Note that the ensemble test is weak enough that it did not need to be changed -- this may or may not be something to work on in the future
We copy them instead of modifying the existing models so that we can run them side by side for a comparison. TODO: Unify with parameters (at least for the auxiliary files) instead of making copies. This is currently unchanged so that we can have a clean diff when we do apply the changes.
Configuration changes include: - `data_preprocessing.py`: don't drop user input where any fild is None, only filter if all fields are not defined. - `evaluation_pipeline.py`: turn off filtering and cutoffs for the similarity code - `get_users.py`: don't require that 50% of the trips are labeled Then make significant changes to `build_save_model`. Concretely: - save the models after the first round, which significantly simplifies the prediction code - pull out the location map generation and user input generation code into separate functions to simplify the code invocation. Testing done: - Ran it against the staging database - Several users did not have any labeled trips ``` $ grep "enough valid trips" /tmp/staging_run.log 2021-07-27 01:31:02,224:DEBUG:Total: 0, labeled: 0, user 9cfc3133-3505-4923-b6fb-feb0e50a5aba doesn't have enough valid trips for further analysis. 2021-07-27 01:31:02,246:DEBUG:Total: 0, labeled: 0, user cc114670-ef99-4247-a3a2-47529247d8ac doesn't have enough valid trips for further analysis. 2021-07-27 01:31:02,267:DEBUG:Total: 0, labeled: 0, user 2bc8ca71-7d0f-4930-ba2c-cf97f7dceaea doesn't have enough valid trips for further analysis. 2021-07-27 01:31:05,340:DEBUG:Total: 171, labeled: 6, user 6373dfb8-cb9b-47e8-8e8f-76adcfadde20 doesn't have enough valid trips for further analysis. ``` They were filtered correctly ``` $ grep "enough valid trips" /tmp/staging_run.log | wc -l 21 ``` Models for 20 users were created ``` $ ls locations_first_round_* | wc -l 20 ``` No errors
If the user has never submitted an input, we can't infer anything. So we return early. This fixes e-mission#829 (comment)
Concretely: - pull out the matching code into `find_bin` - load only locations and user inputs - from trip -> bin -> user_labels Change filenames for consistency Added additional log statements for clarity Testing done: - Ran the intake pipeline - Ran `label_stats` for a single user - Ran the intake pipeline again - Confirmed that the confirmed trips had inferences
Previous commit: 71e299e fixes: model filenames, load process and variable name fix Testing done: really works, even with reset
- Remove it from the list of algorithms - Switch the ensemble to the one that returns the first value only
Since this is a normal use case and not an error.
Simplified and reconfigured model building and application
@shankari it is not successful. Is it affected by the previous PR? |
If you look at the logs, all the
errors are gone, both because of f2060cc which handled the corner case, and because of e666d22 which removed the random placeholder. There is still failure due to
Since that is in the storage tests, it must be because of 7fac452 Let me take a look |
We currently have hard-coded radius and other parameters for the first round. So the instructions for generating labels are:
|
So that we can collapse the different categories and increase the percentage e-mission/e-mission-docs#656 (comment) + also check to see whether the replacement mode actually exists before performing the mapping since we are checking this into master and it may not always exist. And even on CanBikeCO, we may have users who have never filled in a replaced mode.
And a launch script that takes multiple possible arguments. This allows us to build individual models instead of building them all at the same time.
…cr/e-mission-server into modeling_and_functions
@GabrielKS I have now enabled the Can you see if this makes a difference? New instruction to build the model: |
This ensures that we can match trips properly (for fix, see e-mission/e-mission-docs#656 (comment))
Deployed this to staging.
Note that because of the large number of expected trips, the deletion is taking 2 minutes and involves multiple thousand expected trips.
|
As a performance improvement, in 7fac452 we switched to reading the uuids from the UUID database instead of looking at the unique entries in the timeseries. This resulted in an order of magnitude improvement ``` %timeit edb.get_uuid_db().distinct("uuid") 1.29 ms ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` ``` %timeit edb.get_timeseries_db().distinct("user_id") 3.68 s ± 105 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` But it required that all users have an entry in the UUID DB to show up in the `get_all_uuids()` call. So we change this test to ensure that: - change the existing test loading code to also register users if a `registerEmail` is filled in; this required some refactoring - for this test, pass in a registerEmail in the test object Testing done: ``` $ ./e-mission-py.bash emission/tests/storageTests/TestTimeSeries.py TestTimeSeries.testGetUUIDList storage not configured, falling back to sample, default configuration Connecting to database URL localhost:27017 2021-08-10 12:38:33,424:INFO:4669529536:Before loading from emission/tests/data/real_examples/shankari_2015-aug-21, timeseries db size = 4527962 2021-08-10 12:38:33,471:INFO:4669529536:registering email = user1 Setting up real example for 4405c6ce-f8c8-4e37-b5b7-c69c0072135e 2021-08-10 12:38:37,001:INFO:4669529536:After loading, timeseries db size = 4530087 2021-08-10 12:38:37,019:DEBUG:4669529536:First few entries = ['2015-08-21T07:55:13.408000-07:00', '2015-08-21T07:55:13.408000-07:00', '2015-08-21T07:55:01.638000-07:00', '2015-08-21T07:55:07.414000-07:00', '2015-08-21T07:55:01.638000-07:00', '2015-08-21T07:54:51.095000-07:00', '2015-08-21T07:54:51.095000-07:00', '2015-08-21T07:54:51.095000-07:00', '2015-08-21T07:54:52.153000-07:00', '2015-08-21T07:55:07.414000-07:00'] 2021-08-10 12:38:37,030:INFO:4669529536:Before loading from emission/tests/data/real_examples/shankari_2015-aug-27, timeseries db size = 4530087 2021-08-10 12:38:37,063:INFO:4669529536:registering email = user1 Setting up real example for 4405c6ce-f8c8-4e37-b5b7-c69c0072135e 2021-08-10 12:38:39,418:INFO:4669529536:After loading, timeseries db size = 4531569 2021-08-10 12:38:39,433:DEBUG:4669529536:First few entries = ['2015-08-21T07:55:13.408000-07:00', '2015-08-21T07:55:13.408000-07:00', '2015-08-21T07:55:01.638000-07:00', '2015-08-21T07:55:07.414000-07:00', '2015-08-21T07:55:01.638000-07:00', '2015-08-21T07:54:51.095000-07:00', '2015-08-21T07:54:51.095000-07:00', '2015-08-21T07:54:51.095000-07:00', '2015-08-21T07:54:52.153000-07:00', '2015-08-21T07:55:07.414000-07:00'] . ---------------------------------------------------------------------- Ran 1 test in 6.193s OK ```
+ Implements the algorithm described in e-mission/e-mission-docs#663 (comment) + Uses constant values `A=0.01`, `B=0.75`, `C=0.25` + Changes eacilp.primary_algorithms to use the new algorithm No unit tests yet (working on a tight timeline), but tested as follows: 1. Run `[eacili.n_to_confidence_coeff(n) for n in [1,2,3,4,5,7,10,15,20,30,1000]]`, check for reasonableness, compare to results from plugging the formula into a calculator 2. Run the modeling and intake pipeline with `eacilp.primary_algorithms` set to the old algorithm 3. Run the first few cells of the "Explore label inference confidence" notebook for a user with many inferrable trips to get a list of unique probabilities and counts for that user 4. Set `eacilp.primary_algorithms` to the new algorithm, rerun the intake pipeline (modeling pipeline hasn't changed) 5. Rerun the notebook as above, examine how the list of probabilities and counts has changed
+ See comments in e-mission/e-mission-docs#663
+ Useful for playing around with other constants in notebooks
With input: ``` A, B, C D, E, F G, N/A, N/A ``` Before (doesn't add up to 1!!): ``` A, B, C 0.3 D, E, F 0.3 ``` After (adds up to 1!!) ``` A, B, C 0.5 D, E, F 0.5 ``` Hopefully this doesn't happen very often
This makes it compatible with https://github.com/e-mission/e-mission-server/releases/tag/v3.0.0 in particular, by merging e-mission/e-mission-server#829 and e-mission/e-mission-server#828 e-mission/e-mission-server#825
Move codes from private data repo to here
These are what I have for now.
When I am done with refactoring, I will update the codes here.