-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📖 Update data request documentation, #1038
base: master
Are you sure you want to change the base?
Conversation
Added details on how to request and load data TODO: Fill in the extra link, confirm data loading instructions
Once the info in this new readme is OK'd, this branch should be merged before PR 41 -- I want to point to some of the instructions here in this branch, and will be unable to link until the changes are merged. |
1. Data formats for the json objects are at `emission/core/wrapper` (e.g. `emission/core/wrapper/location.py` and `emission/core/wrapper/cleanedtrip.py`) | ||
|
||
## Data analysis ## | ||
## Data Analysis - Server ## |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually the deprecated method now. We sometimes internally use the user specific dumps to reproduce errors, but for external users, they either get the mongodump, or download csv files from their admin dashboard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a change to specify this is for internal testing only! Let me know if I should be more specific about it being a deprecated method.
Should I add a footnote about working with CSV's? I've only worked with the mongodump
format, but could ask around for helping writing a section on that process.
- More information on this approach can be found in the public dashboard [ReadMe](https://github.com/e-mission/em-public-dashboard/blob/main/README.md#large-dataset-workaround). | ||
|
||
|
||
In general, it is best to follow the instructions of the repository you are working with. There are subtle differences between them, and these instructions are intended as general guidance only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should unify these but obviously we should keep this documentation until we do.
- Made the docker style analysis the main data analysis method - Emphasized that the server method was for internal debugging purposes.
|
||
## Working With Data ## | ||
|
||
After requesting data from TSDC, you should receive a "mongodump" file -- a collection of data, archived in `.tar.gz` format. Here are the broad steps you need to take in order to work with this data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TSDC will not provide mongodumps. The TSDC will provide access to the data in csv files/postgres database. The mongodump is currently only available for internal use.
Shankari and I discussed this page of the documentation, and found it was out of date -- as such, we're updating it! I'm also adding some documentation on how to run the data, building off of Abby's work with the dashboard (link)