-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export data as CSV and JSON #9
Conversation
@SelmaGuedidi can you please remove the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks mostly good! But unfortunately you used the wrong source data. Sorry for the confusion!
export.py
Outdated
all_boycott_data = read_yaml(yaml_all_boycott) | ||
all_data = read_yaml(yaml_all) | ||
altenatives_data = read_yaml(yaml_alternatives) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the data in the output/yaml
directory is an old format that should have been deleted.
You need to read the data from data/
, and follow the schemas there.
For the output, I think CSV should output two files: one for companies, one for brands. The JSON should output one file, which looks like:
{
"brands": [ {...}, {...} ],
"companies": [ {...}, {...} ]
}
Also, please add an id
to each brand in both the CSV and JSON. The id
is just the filename (without the extension).
python export.py | ||
git diff --exit-code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for adding this check
.pre-commit-config.yaml
Outdated
name: Export YAML to CSV and JSON | ||
language: python | ||
entry: python export.py | ||
additional_dependencies: ['pyaml'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: requirements.txt
uses pyyaml
, not pyaml
@idris I made the changes , hope it's fixed :)
|
Can we add timestamps on the exports? createdAt, updatedAt, utc timezone maybe For now they can be the same, and lets raise a new issue to edit the updateAt correctly. |
if isinstance(value, list): | ||
return ', '.join(map(str, value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format of dictionaries in the CSV was weird, so I figure JSON is a good format, and it works well for both lists and dicts:
if isinstance(value, list): | |
return ', '.join(map(str, value)) | |
if isinstance(value, list) or isinstance(value, dict): | |
return json.dumps(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(value, list):
return ', '.join(map(str, value))
brands.csv:
if isinstance(value, list) or isinstance(value, dict):
return json.dumps(value)
brands.csv:
I used ', '.join(map(str, value))
to remove []
and ""
in the csv files.
and I think dict format does not exist here because isinstance(value, dict)
is always false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think you're right that the ', '.join(...)
version is better for lists.
If you look at the sabra row, there is a dict for the stakeholders
column, which shows up like this:
"{'id': 'pepsico', 'type': 'owner', 'percent': 50}, {'id': 'strauss-group', 'type': 'owner', 'percent': 50}"
We're not really sure how the stakeholders
bit will pan out, so I'm just going to leave your join as-is, and remove stakeholders
from the CSV export for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oooh the plot thickens.. it's an array of objects. Even more reason to ignore that for now.
Thanks @SelmaGuedidi ! I added one suggestion above, but let me know if you disagree. @THM222 I think we can handle both timestamps in a separate issue. I created that here: #10 |
Closes #5