Proposed language for supporting multiple kinds in concept docs #23566

schrockn · 2024-08-09T23:51:54Z

Summary & Motivation

Per request as prerequiste to generalizing and pluralizing kinds in our APIs, I've written language to extend our "Metadata and tags" page to be "metadata, kinds, and tags".

The relevant explanation:

"Kinds label and categorize definitions in Dagster."

"Tags to annotate and organize definitions in your Dagster project."

General idea is that "label" and "categorize" are more specific and stronger forms of "annotate" and "organize".

It is also transparent that kinds are implemented as system tags.

How I Tested These Changes

Read https://kinds-docs.dagster.dagster-docs.io/concepts/metadata-tags

schrockn · 2024-08-09T23:52:08Z

Proposed language for supporting multiple kinds in concept docs #23566 👈
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @schrockn and the rest of your teammates on Graphite

github-actions · 2024-08-09T23:59:43Z

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-eurcl3asm-elementl.vercel.app
https://kinds-docs.dagster.dagster-docs.io

Direct link to changed pages:

https://dagster-docs-eurcl3asm-elementl.vercel.app
https://kinds-docs.dagster.dagster-docs.io/concepts/metadata-tags

PedramNavid · 2024-08-10T18:45:16Z

If you'll allow me a side note (one that we are looking to address, by the way): it is notable that this page is buried under Concepts > Advanced > Metadata & Tags. I am leaning toward naming these pages 'About Metadata, Kinds, and Tags' to emphasize that this is a conceptual explanation of how metadata works and not how-to use metadata, kinds, or tags (which is where most users would want to start.)

To that end, the whole page does feel somewhat incomplete. It lacks the expected depth of a conceptual page, and it doesn't provide easy access on how to accomplish the specific task of applying metadata, kinds, and tags to your workflows.

Back to the topic at hand! In here, we say Kinds label and categorize definitions. Do they categorize anything other than assets?

Here's my suggested rewrite. I try to break up each major point into a single paragraph. Here I've only used the word asset, but if we mean that kinds apply to other types of definitions, we should emphasize that. I found the adjective language a little confusing so I removed it.

In Dagster, Kinds are a way to label and categorize assets within your data pipeline. These descriptions of your assets can appear within the UI as icons or be used to search and filter within the data cataloging features of Dagster.

Kinds serve as descriptive labels that help you and your team quickly identify the nature or purpose of an asset. They're particularly useful in visual representations of your pipeline, where they correspond to prominently displayed icons. Dagster uses a mapping you can find at XXX to map the name of the kind to an icon in the UI.

We recommend using kinds as a means to label assets that can help you answer questions you may ask, such as what external system is involved with materializing a particular asset. For example, you may label assets with the dbt, Databricks, or s3 kind, as it can be helpful to both visually identify assets by these labels, or to filter for them within the Dagster Catalog.

<AdmonitionInfoThing> 
💡 The Dagster Catalog is a Dagster+ feature. 
</AdmonitionInfoThing>

Within Dagster, Kinds are implemented as system-defined tags with a special prefix: dagster/kind/. In Dagster, compute_kind is a specific instance of the generic kind that is available as a parameter when defining an asset using the @asset decorator. (Is it? I am making this up. I think we should say something about it.)

This implementation allows Dagster's user interfaces to treat these tags in unique ways, such as displaying or hiding them as appropriate to enhance the user experience.

You can view how Dagster uses kinds internally by searching dagster-open-platform for the kind string.

For instructions on how to implement kinds, see the 'Guide on implementing kinds'

schrockn · 2024-08-10T23:46:58Z

In Dagster, compute_kind is a specific instance of the generic kind that is available as a parameter when defining an asset using the @asset decorator. (Is it? I am making this up. I think we should say something about it.)

We will deprecate compute_kind and eventually delete it in favor of just kinds.

yuhan

I'm supportive of the kinds and kinds are system-defined tags, prefixed with the "dagster/kind/" prefix direction!

However, I think this particular doc page would need more polish or IA rewrite to become "jobs to be done" and more complete -- we need to map the use cases to different concepts and their usage; for example, "categorize assets visually" -> "kinds", "annotate assets with arbitrary info for better filtering and searching" -> "tags", etc.

Re: @PedramNavid 's comment

Back to the topic at hand! In here, we say Kinds label and categorize definitions. Do they categorize anything other than assets?

If the main goal of kinds is to visually categorize and enrich your definitions, I think for now, it would apply to assets and ops (the ones that show up in visual graphs). While this is out of scope for this discussion, it'd be interesting to explore whether we can expand "kinds" to other definitions and how we'll explain that, as a follow up.

yuhan · 2024-08-11T20:15:05Z

docs/content/concepts/metadata-tags.mdx

+
+Kinds label and categorize definitions in Dagster. Notably, kinds correspond to prominently displayed icons in our visual tools (see XXX for supported visual kinds). An effective proxy for whether something should be a kind is whether or not it is an adjective in your day-to-day language and meaningfully identifies that definition. E.g. If "Is this a dbt asset or a databricks asset?" is a question in your team that would indicate that "dbt" and "spark" are good kinds for your team.
+
+In its implementation, kinds are system-defined tags, prefixed with the "dagster/kind/" prefix. As is true with all of our system tags, our UIs reserve the right to treat these tags specially, hiding them or promoting them as appropriate.


In its implementation, kinds are system-defined tags, prefixed with the "dagster/kind/" prefix.

👍 this is great! I liked the explicitness here and it's also good that we can gradually disclose the relation between kinds and tags.

schrockn · 2024-08-11T21:40:08Z

However, I think this particular doc page would need more polish or IA rewrite to become "jobs to be done" and more complete -- we need to map the use cases to different concepts and their usage; for example, "categorize assets visually" -> "kinds", "annotate assets with arbitrary info for better filtering and searching" -> "tags", etc.

My explicit goal was to minimally add kinds to this page rather than to completely revamp.

sryza · 2024-08-12T15:09:32Z

My high-level take here:

This framing leads me to believe that "kinds" and "tags" are too similar to justify two different concepts, and that this will add unnecessary confusion to organizing, labeling, categorizing, and annotating assets.
I think it’s worth digging in on the particulars of the relationship between kinds and tags.
I'm going to expand on these a bit below, but am ultimately ok with disagreeing and committing on this one. The fact that others aren't reacting to this as strongly as I am makes me wonder if I'm overly fixated on certain aspects of it?

Overlapping concepts

In order to decide whether to use tags or kinds, this proposal asks users to determine whether they're "labeling and categorizing" vs "annotating and organizing". I think this is a pretty difficult question to answer, given how similar these activities are.

As some evidence of the similarity between “tags” and “labeling”, Dagster tags are modeled off of Kubernetes “labels”, and Datahub describes their tags as a kind of label. OpenMetadata has TagLabel.

I don’t think the fact that kinds are implemented on top of tags addresses this issue, because users still need to make a decision about which parameter to use and which concepts to interact with in the UI.

Relationship between kinds and tags particular

Do we have a definitive answer to the question “Is a kind a tag?”

Mechanically:

Do kinds show up on the list of tags on the asset details page? If so, do we show the “dagster/kind/” prefix?
Do kinds show up in the tags filter in the catalog? If so, do we show the “dagster/kind/” prefix?
If someone hits the GraphQL API to get the set of tags for an asset, are the kinds included?
Will we add a “kind” filter to the catalog?
Will we add a “kind:” parameter to our asset selection syntax?

A couple final thoughts

I bet people would like the ability to associate icons with tags
If we had enough screen real estate on the asset nodes, I wonder if we would want to show all the asset tags.

OwenKephart · 2024-08-12T17:04:09Z

docs/content/concepts/metadata-tags.mdx

@@ -40,6 +41,14 @@ How metadata is defined depends on whether you're using assets or ops and jobs:

 ---

+## How kinds work
+
+Kinds label and categorize definitions in Dagster. Notably, kinds correspond to prominently displayed icons in our visual tools (see XXX for supported visual kinds). An effective proxy for whether something should be a kind is whether or not it is an adjective in your day-to-day language and meaningfully identifies that definition. E.g. If "Is this a dbt asset or a databricks asset?" is a question in your team that would indicate that "dbt" and "spark" are good kinds for your team.


Regarding this heuristic, I can imagine people using the following sentences:

"Is this a silver-quality asset or a gold-quality asset?"

"Is this a raw asset or a staging asset?"

"Is this a Team A asset or a Team B asset?"

I worry that encouraging users to construct a flat list of:

@asset(kinds=["silver", "raw", "team_a", "s3", "snowflake"]) def foo(): ...

is error-prone (what if you use a tool called "silver", but also want to distinguish between "silver quality" and "gold quality", for example), and leaves out potentially-useful information for someone coming across this definition in the future.

To reiterate and simplify some comments I had on the original RFC, my overall opinion is that:

People are going to want / need to categorize their information in a more sophisticated way than a flat list, and we should encourage that (i.e. if you put in the work to map your kinds to categorical information, your searching experience will improve, as will your UI). If we use kinds, then there is no convenient way for you to add that information.

We should not be overally-opinionated in what sort of categorical groupings we allow, and in the long run users should have complete control over the mapping from tags to visual treatment in the UI (i.e. we should allow users to make any grouping they want, not just compute_kind / storage_kind). I consider "uncategorized" to be a valid category (sometimes it's just not worth the effort to think of some ontology that a particular tag might fit into -- the flat list really is just easier), but again we shouldn't prevent people from being more explicit when they want to be.

Elevating a flat list of kinds to a top-level concept and directly correlating it with UI treatment will make it significantly more difficult to introduce a more flexible concept in the future

benpankow · 2024-08-12T17:16:34Z

I don't have strong opposition to generalizing kinds, but I do still worry that introducing a third top-level descriptive concept for assets is going to cause confusion. The current binary between tags (used for categorization and filtering in the UI) and metadata (used for freeform information about an asset's properties) is fairly easy to navigate.

The nouns tags and metadata are commonly used in other tools and fairly easy for users to intuit. kind does not have the same level of recognition, and I wonder if a noun like label or visual label would more clearly indicate its more superficial, UI-forward role relative to tags.

It is going to be difficult to provide meaningful recommendations about which cases to use a kind or a tag. Evaluating the heuristic of

whether or not it is an adjective in your day-to-day language and meaningfully identifies that definition

different practitioners and teams coexisting in the same catalog will make this determination differently. I worry that the lack of guidance and lack of guardrails will result in kinds being underutilized or inconsistently utilized. Some teams may opt to embed quality information in tags and others in kinds, some teams might attach their team name as a kind.

schrockn · 2024-08-15T23:33:51Z

FYI I'm posting a final follow up here:

https://github.com/dagster-io/internal/discussions/10821

Take stab at kinds docs

2116a1a

fixes

600763a

cp

4937ef6

schrockn changed the title ~~Take stab at kinds docs~~ Proposed language for supporting multiple kinds in concept docs Aug 10, 2024

schrockn requested review from braunjj, PedramNavid, sryza and yuhan August 10, 2024 00:01

yuhan reviewed Aug 11, 2024

View reviewed changes

OwenKephart reviewed Aug 12, 2024

View reviewed changes

schrockn closed this Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed language for supporting multiple kinds in concept docs #23566

Proposed language for supporting multiple kinds in concept docs #23566

schrockn commented Aug 9, 2024 •

edited

Loading

schrockn commented Aug 9, 2024

github-actions bot commented Aug 9, 2024 •

edited

Loading

PedramNavid commented Aug 10, 2024

schrockn commented Aug 10, 2024

yuhan left a comment

yuhan Aug 11, 2024

schrockn commented Aug 11, 2024

sryza commented Aug 12, 2024

OwenKephart Aug 12, 2024 •

edited

Loading

benpankow commented Aug 12, 2024

schrockn commented Aug 15, 2024


		Kinds label and categorize definitions in Dagster. Notably, kinds correspond to prominently displayed icons in our visual tools (see XXX for supported visual kinds). An effective proxy for whether something should be a kind is whether or not it is an adjective in your day-to-day language and meaningfully identifies that definition. E.g. If "Is this a dbt asset or a databricks asset?" is a question in your team that would indicate that "dbt" and "spark" are good kinds for your team.

		In its implementation, kinds are system-defined tags, prefixed with the "dagster/kind/" prefix. As is true with all of our system tags, our UIs reserve the right to treat these tags specially, hiding them or promoting them as appropriate.

Proposed language for supporting multiple kinds in concept docs #23566

Proposed language for supporting multiple kinds in concept docs #23566

Conversation

schrockn commented Aug 9, 2024 • edited Loading

Summary & Motivation

How I Tested These Changes

schrockn commented Aug 9, 2024

github-actions bot commented Aug 9, 2024 • edited Loading

PedramNavid commented Aug 10, 2024

schrockn commented Aug 10, 2024

yuhan left a comment

Choose a reason for hiding this comment

yuhan Aug 11, 2024

Choose a reason for hiding this comment

schrockn commented Aug 11, 2024

sryza commented Aug 12, 2024

OwenKephart Aug 12, 2024 • edited Loading

Choose a reason for hiding this comment

benpankow commented Aug 12, 2024

schrockn commented Aug 15, 2024

schrockn commented Aug 9, 2024 •

edited

Loading

github-actions bot commented Aug 9, 2024 •

edited

Loading

OwenKephart Aug 12, 2024 •

edited

Loading