Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Externalise some parsers in specific repos #197

Open
RouxRC opened this issue Dec 11, 2022 · 1 comment
Open

Externalise some parsers in specific repos #197

RouxRC opened this issue Dec 11, 2022 · 1 comment

Comments

@RouxRC
Copy link
Member

RouxRC commented Dec 11, 2022

Some directories in batch like hemicycle, commission and jo which can't be replaced by AN's opendata could be each their own repository and remain included in old ND as subgitrepos

This could allow to:

  • also include and track revisions of all source/parsed/corrected json and reuse them as well in the future database started by @implicitdef with @eraviart's Tricoteuses
  • have a second branch with changes for new DB such as using AN's timestamps for instance
  • run serverless within github actions
@implicitdef
Copy link

Hi
some notes :

  • I agree that it would nice if the stuff you parse, especially the complicated stuff like the "comptes rendus", was stored and made easily available somewhere
  • But it won't be in my new Postgres DB (the one I call for now "nosdeputes_releve" and that I use for my new frontend), because this one I want to always be "stateless", i.e. I always want to be able to drop it and rebuild it from primary sources.
  • But we can totally, at some point, make a different DB (or maybe use a storage service like S3 etc.), where we do store data forever. It could serve as a primary source for building the "nosdeputes_releve" DB. I can't focus on that right now, but maybe later, in 6 months, a year ? Definitely before 2027.
  • I mentionned with you the idea of storing the data in a git repository like Tricoteuses. I started some code but you were right the data is way too big, at least for the "compte rendus". We can split files to avoid Github's limits but I think Github would still detect us at some point and could block us or something like that. I still thinks it's a good idea for smaller data (like NosDeputes's slugs), I might use it for that, but not for the whole database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants