Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export data as CSV and JSON #9

Merged
merged 5 commits into from
Jan 21, 2024
Merged

Export data as CSV and JSON #9

merged 5 commits into from
Jan 21, 2024

Conversation

SelmaGuedidi
Copy link
Contributor

@SelmaGuedidi SelmaGuedidi commented Jan 13, 2024

Closes #5

@idris
Copy link
Contributor

idris commented Jan 15, 2024

@SelmaGuedidi can you please remove the output/yaml directories? Those are old and not tied to the actual data. I think it was just a sample format from before that's no longer relevant.

Copy link
Contributor

@idris idris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks mostly good! But unfortunately you used the wrong source data. Sorry for the confusion!

export.py Outdated
Comment on lines 50 to 52
all_boycott_data = read_yaml(yaml_all_boycott)
all_data = read_yaml(yaml_all)
altenatives_data = read_yaml(yaml_alternatives)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the data in the output/yaml directory is an old format that should have been deleted.

You need to read the data from data/, and follow the schemas there.

For the output, I think CSV should output two files: one for companies, one for brands. The JSON should output one file, which looks like:

{
  "brands": [ {...}, {...} ],
  "companies": [ {...}, {...} ]
}

Also, please add an id to each brand in both the CSV and JSON. The id is just the filename (without the extension).

Comment on lines +20 to +21
python export.py
git diff --exit-code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for adding this check

name: Export YAML to CSV and JSON
language: python
entry: python export.py
additional_dependencies: ['pyaml']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: requirements.txt uses pyyaml, not pyaml

@idris idris mentioned this pull request Jan 15, 2024
@SelmaGuedidi
Copy link
Contributor Author

SelmaGuedidi commented Jan 16, 2024

@idris I made the changes , hope it's fixed :)

  • Removed data in output/yaml
  • Updated data retrieval to follow schemas in data/
  • CSV output generates separate files for companies and brands
  • JSON output consolidated into a single file with "brands" and "companies" keys
  • Updated .pre-commit-config.yaml to use pyyaml instead of pyaml

@SelmaGuedidi SelmaGuedidi requested a review from idris January 17, 2024 12:19
@THM222
Copy link
Contributor

THM222 commented Jan 17, 2024

Can we add timestamps on the exports? createdAt, updatedAt, utc timezone maybe yyyy-mm-ddTHH:mm:ss.SSSZ?

For now they can be the same, and lets raise a new issue to edit the updateAt correctly.

Comment on lines +13 to +14
if isinstance(value, list):
return ', '.join(map(str, value))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format of dictionaries in the CSV was weird, so I figure JSON is a good format, and it works well for both lists and dicts:

Suggested change
if isinstance(value, list):
return ', '.join(map(str, value))
if isinstance(value, list) or isinstance(value, dict):
return json.dumps(value)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idris

   if isinstance(value, list):
        return ', '.join(map(str, value))

brands.csv:

Screenshot 2024-01-18 at 7 15 18 PM
   if isinstance(value, list) or isinstance(value, dict):
        return json.dumps(value)

brands.csv:

Screenshot 2024-01-18 at 7 19 48 PM

I used ', '.join(map(str, value)) to remove [] and "" in the csv files.
and I think dict format does not exist here because isinstance(value, dict) is always false

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think you're right that the ', '.join(...) version is better for lists.

If you look at the sabra row, there is a dict for the stakeholders column, which shows up like this:
"{'id': 'pepsico', 'type': 'owner', 'percent': 50}, {'id': 'strauss-group', 'type': 'owner', 'percent': 50}"

We're not really sure how the stakeholders bit will pan out, so I'm just going to leave your join as-is, and remove stakeholders from the CSV export for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooh the plot thickens.. it's an array of objects. Even more reason to ignore that for now.

@idris
Copy link
Contributor

idris commented Jan 18, 2024

Thanks @SelmaGuedidi ! I added one suggestion above, but let me know if you disagree.

@THM222 I think we can handle both timestamps in a separate issue. I created that here: #10

idris
idris previously approved these changes Jan 21, 2024
@idris idris merged commit 07207e3 into TechForPalestine:main Jan 21, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Export data as CSV and JSON
3 participants