Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate more breadcrumbs #3406

Open
larsyencken opened this issue Mar 26, 2024 · 5 comments · May be fixed by #4343
Open

Automate more breadcrumbs #3406

larsyencken opened this issue Mar 26, 2024 · 5 comments · May be fixed by #4343

Comments

@larsyencken
Copy link
Contributor

larsyencken commented Mar 26, 2024

Problem

We see it as our role to put work in context; when it comes to articles, that context is often the broader topic or area that the article sits in.

Currently, one way we provide context for an article is with breadcrumbs (posts_gdocs.breadcrumbs), but they need to be manually specified every time. This is a waste of human effort, given that every article is tagged with a topic.

We would like instead do this automatically for each article.

Technical notes

If #3695 is merged, we will soon be able to construct a tag graph.

A simplified example:
tag graph example
⭐️= a topic

The top level tags are currently all assumed to be areas and don't have an associated page. The reason we have areas at all is because of the header nav (though there is no reason why we wouldn't be able to make these pages in the future.)

As long as we ensure every article is tagged with at least one topic tag, we'll be able to use this graph to construct breadcrumbs for every article.

We'll do this by constructing a subgraph that only features topic tags and displaying the path to any given topic.

So an article about indoor air pollution would be:
Home > Air Pollution > Indoor Air Pollution

An article about cancer:
Home > Cancer

An article about nuclear energy:
Home > Energy > Nuclear Energy

An article about fossil fuels has two possible breadcrumb paths (because the Fossil Fuels tag has two parents that are topics.)

It could either be:
Home > Greenhouse Gas Emissions > Fossil Fuels
or
Home > Energy > Fossil Fuels

The tag graph has weighted edges, so that we can set one edge to always be preferred over the others, but there will likely be cases where we have an article that would prefer to have the other edge highlighted.

For example, if we set the Energy-Fossil Fuels edge to have a higher weight, this will be fine for articles that are about energy and fossil fuels: we can show the default breadcrumbs of Home > Energy > Fossil Fuels, but if we have an article that's about fossil fuels' contributions to GHG emissions, we'll want the breadcrumbs to be Home > Greenhouse Gas Emissions > Fossil Fuels

A decision around whether or not parent tags need to be manually set ("parent implicitness") needs to be made. Ideally parent tags can be implicit the majority of the time and we can derive the breadcrumbs from the tag graph weights. In cases where we need to deviate from the default, authors can explicitly set the parent tag that they want to take priority, on the gdoc.

We can render this in the admin UI to ensure that authors always understand what the breadcrumbs for a given article will be:

No possible breadcrumb ambiguity:
image

Breadcrumb ambiguity, but implicit tag is correct
image

Breadcrumb ambiguity, implicit tag overridden
image

An issue with implicit tag overrides is that the tag graph may get updated such that the overrides no longer make sense (e.g. if we delete the Greenhouse Gas Emissions-Fossil Fuels edge) This will probably be a rare occurrence but ideally we can have a way to migrate articles that would be affected by any update to the tag graph.

One more case to consider is that we may have articles that don't neatly fall into a single path of the graph. e.g. an article about nuclear energy and cancer.

In such cases, i.e. when an article is tagged with two leaf nodes, we may want to not render breadcrumbs at all and instead show the tags in a list:
image

Another option could be to show multiple lanes of breadcrumbs
two leaves example 1

We could try and merge them somehow 😬
two leaves example 3
two leaves example 2

Until we make a decision here, it's not clear how we should render this in the admin.

One final consideration is citations. Ideally, we're always happy to have the closest topic page for an article be its citation page. So an article about nuclear energy would be cited at https://ourworldindata.org/nuclear-energy

If this won't always be the case, we'll need a way to choose which parent tag should be cited instead:
citation override
A column to track this would have to be added in the gdocs_posts_x_tags table

@danyx23
Copy link
Contributor

danyx23 commented May 8, 2024

@JoeHasell we talked about this a bit today but it seems a bit tricky. Maybe a good one to chat about with you in the next site meeting on May 14.

@danyx23
Copy link
Contributor

danyx23 commented Jun 5, 2024

@ikesau can you sketch out the body of this issue since Joe and Lars are busy this week?

@ikesau
Copy link
Member

ikesau commented Jun 7, 2024

@danyx23 done!

I think we need a decision on how to handle multi-leaf node articles. Once we've got that we should be good to go.

@ikesau
Copy link
Member

ikesau commented Jul 9, 2024

It seems we're okay with picking a single leaf node, tiebreaking with the tag graph weights

We could also continue to support a way to manually override these breadcrumbs, in the cases where we want to defy the tag graph weights. We can continue to use the breadcrumbs column in posts_gdocs for this.

@ikesau
Copy link
Member

ikesau commented Oct 8, 2024

Revisiting this for batch cycle consideration, it would be better to keep the existing gdoc breadcrumb system as the way we do overrides, and then otherwise use the tag graph to populate breadcrumbs (with no UI for changing parents)

This way the system is simple: automatic breadcrumbs 95% of the time and completely manual overrides for the other 5%.

@ikesau ikesau self-assigned this Dec 14, 2024
@ikesau ikesau linked a pull request Dec 20, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants