Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed language for supporting multiple kinds in concept docs #23566

Closed
wants to merge 3 commits into from

Conversation

schrockn
Copy link
Member

@schrockn schrockn commented Aug 9, 2024

Summary & Motivation

Per request as prerequiste to generalizing and pluralizing kinds in our APIs, I've written language to extend our "Metadata and tags" page to be "metadata, kinds, and tags".

The relevant explanation:

"Kinds label and categorize definitions in Dagster."

"Tags to annotate and organize definitions in your Dagster project."

General idea is that "label" and "categorize" are more specific and stronger forms of "annotate" and "organize".

It is also transparent that kinds are implemented as system tags.

How I Tested These Changes

Read https://kinds-docs.dagster.dagster-docs.io/concepts/metadata-tags

Copy link
Member Author

schrockn commented Aug 9, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @schrockn and the rest of your teammates on Graphite Graphite

Copy link

github-actions bot commented Aug 9, 2024

@schrockn schrockn changed the title Take stab at kinds docs Proposed language for supporting multiple kinds in concept docs Aug 10, 2024
@PedramNavid
Copy link
Contributor

If you'll allow me a side note (one that we are looking to address, by the way): it is notable that this page is buried under Concepts > Advanced > Metadata & Tags. I am leaning toward naming these pages 'About Metadata, Kinds, and Tags' to emphasize that this is a conceptual explanation of how metadata works and not how-to use metadata, kinds, or tags (which is where most users would want to start.)

To that end, the whole page does feel somewhat incomplete. It lacks the expected depth of a conceptual page, and it doesn't provide easy access on how to accomplish the specific task of applying metadata, kinds, and tags to your workflows.

Back to the topic at hand! In here, we say Kinds label and categorize definitions. Do they categorize anything other than assets?

Here's my suggested rewrite. I try to break up each major point into a single paragraph. Here I've only used the word asset, but if we mean that kinds apply to other types of definitions, we should emphasize that. I found the adjective language a little confusing so I removed it.


In Dagster, Kinds are a way to label and categorize assets within your data pipeline. These descriptions of your assets can appear within the UI as icons or be used to search and filter within the data cataloging features of Dagster.

Kinds serve as descriptive labels that help you and your team quickly identify the nature or purpose of an asset. They're particularly useful in visual representations of your pipeline, where they correspond to prominently displayed icons. Dagster uses a mapping you can find at XXX to map the name of the kind to an icon in the UI.

We recommend using kinds as a means to label assets that can help you answer questions you may ask, such as what external system is involved with materializing a particular asset. For example, you may label assets with the dbt, Databricks, or s3 kind, as it can be helpful to both visually identify assets by these labels, or to filter for them within the Dagster Catalog.

<AdmonitionInfoThing> 
💡 The Dagster Catalog is a Dagster+ feature. 
</AdmonitionInfoThing>

Within Dagster, Kinds are implemented as system-defined tags with a special prefix: dagster/kind/. In Dagster, compute_kind is a specific instance of the generic kind that is available as a parameter when defining an asset using the @asset decorator. (Is it? I am making this up. I think we should say something about it.)

This implementation allows Dagster's user interfaces to treat these tags in unique ways, such as displaying or hiding them as appropriate to enhance the user experience.

You can view how Dagster uses kinds internally by searching dagster-open-platform for the kind string.

For instructions on how to implement kinds, see the 'Guide on implementing kinds'

@schrockn
Copy link
Member Author

In Dagster, compute_kind is a specific instance of the generic kind that is available as a parameter when defining an asset using the @asset decorator. (Is it? I am making this up. I think we should say something about it.)

We will deprecate compute_kind and eventually delete it in favor of just kinds.

Copy link
Contributor

@yuhan yuhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm supportive of the kinds and kinds are system-defined tags, prefixed with the "dagster/kind/" prefix direction!

However, I think this particular doc page would need more polish or IA rewrite to become "jobs to be done" and more complete -- we need to map the use cases to different concepts and their usage; for example, "categorize assets visually" -> "kinds", "annotate assets with arbitrary info for better filtering and searching" -> "tags", etc.


Re: @PedramNavid 's comment

Back to the topic at hand! In here, we say Kinds label and categorize definitions. Do they categorize anything other than assets?

If the main goal of kinds is to visually categorize and enrich your definitions, I think for now, it would apply to assets and ops (the ones that show up in visual graphs). While this is out of scope for this discussion, it'd be interesting to explore whether we can expand "kinds" to other definitions and how we'll explain that, as a follow up.


Kinds label and categorize definitions in Dagster. Notably, kinds correspond to prominently displayed icons in our visual tools (see XXX for supported visual kinds). An effective proxy for whether something should be a kind is whether or not it is an adjective in your day-to-day language and meaningfully identifies that definition. E.g. If "Is this a dbt asset or a databricks asset?" is a question in your team that would indicate that "dbt" and "spark" are good kinds for your team.

In its implementation, kinds are system-defined tags, prefixed with the "dagster/kind/" prefix. As is true with all of our system tags, our UIs reserve the right to treat these tags specially, hiding them or promoting them as appropriate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In its implementation, kinds are system-defined tags, prefixed with the "dagster/kind/" prefix.

👍 this is great! I liked the explicitness here and it's also good that we can gradually disclose the relation between kinds and tags.

@schrockn
Copy link
Member Author

However, I think this particular doc page would need more polish or IA rewrite to become "jobs to be done" and more complete -- we need to map the use cases to different concepts and their usage; for example, "categorize assets visually" -> "kinds", "annotate assets with arbitrary info for better filtering and searching" -> "tags", etc.

My explicit goal was to minimally add kinds to this page rather than to completely revamp.

@sryza
Copy link
Contributor

sryza commented Aug 12, 2024

My high-level take here:

  • This framing leads me to believe that "kinds" and "tags" are too similar to justify two different concepts, and that this will add unnecessary confusion to organizing, labeling, categorizing, and annotating assets.
  • I think it’s worth digging in on the particulars of the relationship between kinds and tags.
  • I'm going to expand on these a bit below, but am ultimately ok with disagreeing and committing on this one. The fact that others aren't reacting to this as strongly as I am makes me wonder if I'm overly fixated on certain aspects of it?

Overlapping concepts

In order to decide whether to use tags or kinds, this proposal asks users to determine whether they're "labeling and categorizing" vs "annotating and organizing".  I think this is a pretty difficult question to answer, given how similar these activities are.

As some evidence of the similarity between “tags” and “labeling”, Dagster tags are modeled off of Kubernetes “labels”, and Datahub describes their tags as a kind of label. OpenMetadata has TagLabel.

I don’t think the fact that kinds are implemented on top of tags addresses this issue, because users still need to make a decision about which parameter to use and which concepts to interact with in the UI.

Relationship between kinds and tags particular

Do we have a definitive answer to the question “Is a kind a tag?”

Mechanically:

  • Do kinds show up on the list of tags on the asset details page? If so, do we show the “dagster/kind/” prefix?
  • Do kinds show up in the tags filter in the catalog? If so, do we show the “dagster/kind/” prefix?
  • If someone hits the GraphQL API to get the set of tags for an asset, are the kinds included?
  • Will we add a “kind” filter to the catalog?
  • Will we add a “kind:” parameter to our asset selection syntax?

A couple final thoughts

  • I bet people would like the ability to associate icons with tags
  • If we had enough screen real estate on the asset nodes, I wonder if we would want to show all the asset tags.

@@ -40,6 +41,14 @@ How metadata is defined depends on whether you're using assets or ops and jobs:

---

## How kinds work

Kinds label and categorize definitions in Dagster. Notably, kinds correspond to prominently displayed icons in our visual tools (see XXX for supported visual kinds). An effective proxy for whether something should be a kind is whether or not it is an adjective in your day-to-day language and meaningfully identifies that definition. E.g. If "Is this a dbt asset or a databricks asset?" is a question in your team that would indicate that "dbt" and "spark" are good kinds for your team.
Copy link
Contributor

@OwenKephart OwenKephart Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this heuristic, I can imagine people using the following sentences:

  • "Is this a silver-quality asset or a gold-quality asset?"
  • "Is this a raw asset or a staging asset?"
  • "Is this a Team A asset or a Team B asset?"

I worry that encouraging users to construct a flat list of:

@asset(kinds=["silver", "raw", "team_a", "s3", "snowflake"])
def foo(): ...

is error-prone (what if you use a tool called "silver", but also want to distinguish between "silver quality" and "gold quality", for example), and leaves out potentially-useful information for someone coming across this definition in the future.

To reiterate and simplify some comments I had on the original RFC, my overall opinion is that:

  • People are going to want / need to categorize their information in a more sophisticated way than a flat list, and we should encourage that (i.e. if you put in the work to map your kinds to categorical information, your searching experience will improve, as will your UI). If we use kinds, then there is no convenient way for you to add that information.
  • We should not be overally-opinionated in what sort of categorical groupings we allow, and in the long run users should have complete control over the mapping from tags to visual treatment in the UI (i.e. we should allow users to make any grouping they want, not just compute_kind / storage_kind). I consider "uncategorized" to be a valid category (sometimes it's just not worth the effort to think of some ontology that a particular tag might fit into -- the flat list really is just easier), but again we shouldn't prevent people from being more explicit when they want to be.
  • Elevating a flat list of kinds to a top-level concept and directly correlating it with UI treatment will make it significantly more difficult to introduce a more flexible concept in the future

@benpankow
Copy link
Member

I don't have strong opposition to generalizing kinds, but I do still worry that introducing a third top-level descriptive concept for assets is going to cause confusion. The current binary between tags (used for categorization and filtering in the UI) and metadata (used for freeform information about an asset's properties) is fairly easy to navigate.

The nouns tags and metadata are commonly used in other tools and fairly easy for users to intuit. kind does not have the same level of recognition, and I wonder if a noun like label or visual label would more clearly indicate its more superficial, UI-forward role relative to tags.

It is going to be difficult to provide meaningful recommendations about which cases to use a kind or a tag. Evaluating the heuristic of

whether or not it is an adjective in your day-to-day language and meaningfully identifies that definition

different practitioners and teams coexisting in the same catalog will make this determination differently. I worry that the lack of guidance and lack of guardrails will result in kinds being underutilized or inconsistently utilized. Some teams may opt to embed quality information in tags and others in kinds, some teams might attach their team name as a kind.

@schrockn
Copy link
Member Author

FYI I'm posting a final follow up here:

https://github.com/dagster-io/internal/discussions/10821

@schrockn schrockn closed this Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants