-
-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue: scaling data pages #2379
Comments
@larsyencken @danyx23 One thing that is missing (or not prominently featured) from this list is the work on 'tooling' needed to make it easier a) for data managers to fill out the metadata; b) for researchers and data managers to collaborate on the metadata. Essentially, these were one of our four big bets for the year ('technical text admin'). It feels like our ambition there has implicitly been cut down quite considerably, without us fully reflecting on it. Perhaps we could discuss this on Weds? |
Hey Joe! I don't think an admin's off the table, but it's more that it might be in addition to, and after, many of the things above. In particular, the Related Research & Writing and Related Data blocks are still outstanding and need to be built. There's also the desire to see what pain points emerge from use, although if we waited for them then an admin would really be post-Porto. Perhaps chat next week with Daniel? |
…2739) This PR implements #2379. It adds the missing link in our db from wordpress posts to charts that are used there. It then uses this new posts_links table together with the existing posts_gdocs_links table to find the related writing for a data page by going from indciator id -> charts using this indicator -> articles using this indicator. The posts_links table was modelled on the posts_gdocs_links table as I thought that uniformity is more important than the optimal layout here. Extracting the links is a bit crudely done ATM in that it just uses regex's on the raw html tag instead of parsing the html and querying for a tags. The latter would give us the text content of the content that establishes the links which is probably often useful, but it would complicate and slow down the script. I'd like to hear your opinions on whether this should switch to proper parsing and filling richer information into the DB. The thumbnail rendering is also a bit ad-hoc. We have an Image component but that one is built for use in gdocs and we need to show thumbnails for both WP posts and Gdocs articles. To rank related research and writing we use the pageviews table. This is empty by default in dev environments and so this PR adds a make command to refresh pageviews (fetched from datasette-private) - [ ] ❗ after merging this to production, run the db/syncPostsToGrapher.js script to fill the new relationship table!
There is a follow-up tracking issue for work that we decided not to do as part of the Sept/Oct/Nov cycles: #2949 |
Things we need to do to sustainably author data pages as part of our work, continuing from #1946.
VariableMeta
fields #2383owid-content
data pages to the ETL #2741Enable baking of public data pages via metadata that replace grapher pagesThe text was updated successfully, but these errors were encountered: