Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import data into new format #13

Merged

Conversation

greencloudysky
Copy link
Contributor

Addressing #8.

I have made the assumption that entries in the input dataset which mention owners are brands, while entries that don't mention an owner are companies.

The script only reads the following fields from the input data, while the rest are left as TBD:

  • name (from input name field)
  • description (from input proof field)
  • stakeholders (with id extracted from proof field using a regex, and type defaulting to owner)
  • logo_url (from input imageUrl field)

The script can be run from the scripts directory like: python3 import_new_schema.py ../raw/boycott_list_formatted.json.
It will not overwrite existing files, since the generated files don't contain much information.

data/brands/24s.yaml Outdated Show resolved Hide resolved
data/brands/24s.yaml Outdated Show resolved Hide resolved
thm and others added 2 commits January 28, 2024 13:52
- update reasons field
- update alternatives fields
- add alternatives data
- use unidecode to remove accents
- add countries (default of global)
- remove TBDs
@greencloudysky
Copy link
Contributor Author

@THM222 thank you for your PR to this PR! I think I can resolve the description formatting so will try and push that soon as well.

@greencloudysky
Copy link
Contributor Author

@THM222 in my original changes I didn't overwrite any of the pre-existing files, but your PR to this PR did. Just wanted to confirm if that was intentional or if we want to restore the handful of files that will be overwritten by this PR before merging?

@THM222
Copy link
Contributor

THM222 commented Jan 29, 2024

Good catch! that was an accident
Will fix and update
Thank you!

@THM222
Copy link
Contributor

THM222 commented Jan 29, 2024

@THM222 thank you for your PR to this PR! I think I can resolve the description formatting so will try and push that soon as well.

Nice! When i was working on it i had a quick look, and i think it might be to do with the apostrophe '?

@greencloudysky
Copy link
Contributor Author

Good catch! that was an accident Will fix and update Thank you!

@THM222 have just pushed another commit that restores/slightly updates some pre-existing files that were lost.

@THM222
Copy link
Contributor

THM222 commented Jan 30, 2024

@greencloudysky looks good, but the build is failing :(

Yaml validation seems to be failing for kiehls logo url.. easy enough to add one ourselves

The export script (exports data to csv and json) is also failing.. i couldnt find the exact error in the CI log, but it may be due to the multiline fields.

Will take a look tomorrow. Apologies for more delays!

@greencloudysky
Copy link
Contributor Author

greencloudysky commented Jan 30, 2024

@greencloudysky looks good, but the build is failing :(

Yaml validation seems to be failing for kiehls logo url.. easy enough to add one ourselves

The export script (exports data to csv and json) is also failing.. i couldnt find the exact error in the CI log, but it may be due to the multiline fields.

Will take a look tomorrow. Apologies for more delays!

Ah damn, looks like more than 100 files are missing a logo. I just pushed a change to help with compliance of alternative/stakeholder names at least.

@THM222 THM222 merged commit ff805a4 into TechForPalestine:main Feb 3, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants