Externalise some parsers in specific repos #197

RouxRC · 2022-12-11T23:20:45Z

Some directories in batch like hemicycle, commission and jo which can't be replaced by AN's opendata could be each their own repository and remain included in old ND as subgitrepos

This could allow to:

also include and track revisions of all source/parsed/corrected json and reuse them as well in the future database started by @implicitdef with @eraviart's Tricoteuses
have a second branch with changes for new DB such as using AN's timestamps for instance
run serverless within github actions

implicitdef · 2023-01-14T09:57:50Z

Hi
some notes :

I agree that it would nice if the stuff you parse, especially the complicated stuff like the "comptes rendus", was stored and made easily available somewhere
But it won't be in my new Postgres DB (the one I call for now "nosdeputes_releve" and that I use for my new frontend), because this one I want to always be "stateless", i.e. I always want to be able to drop it and rebuild it from primary sources.
But we can totally, at some point, make a different DB (or maybe use a storage service like S3 etc.), where we do store data forever. It could serve as a primary source for building the "nosdeputes_releve" DB. I can't focus on that right now, but maybe later, in 6 months, a year ? Definitely before 2027.
I mentionned with you the idea of storing the data in a git repository like Tricoteuses. I started some code but you were right the data is way too big, at least for the "compte rendus". We can split files to avoid Github's limits but I think Github would still detect us at some point and could block us or something like that. I still thinks it's a good idea for smaller data (like NosDeputes's slugs), I might use it for that, but not for the whole database.

RouxRC added amélioration scraping 2022 16ème législature labels Dec 11, 2022

RouxRC self-assigned this Dec 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Externalise some parsers in specific repos #197

Externalise some parsers in specific repos #197

RouxRC commented Dec 11, 2022 •

edited

Loading

implicitdef commented Jan 14, 2023

Externalise some parsers in specific repos #197

Externalise some parsers in specific repos #197

Comments

RouxRC commented Dec 11, 2022 • edited Loading

implicitdef commented Jan 14, 2023

RouxRC commented Dec 11, 2022 •

edited

Loading