-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposed language for supporting multiple kinds in concept docs #23566
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Deploy preview for dagster-docs ready! Preview available at https://dagster-docs-eurcl3asm-elementl.vercel.app Direct link to changed pages: |
If you'll allow me a side note (one that we are looking to address, by the way): it is notable that this page is buried under Concepts > Advanced > Metadata & Tags. I am leaning toward naming these pages 'About Metadata, Kinds, and Tags' to emphasize that this is a conceptual explanation of how metadata works and not how-to use metadata, kinds, or tags (which is where most users would want to start.) To that end, the whole page does feel somewhat incomplete. It lacks the expected depth of a conceptual page, and it doesn't provide easy access on how to accomplish the specific task of applying metadata, kinds, and tags to your workflows. Back to the topic at hand! In here, we say Kinds label and categorize definitions. Do they categorize anything other than assets? Here's my suggested rewrite. I try to break up each major point into a single paragraph. Here I've only used the word asset, but if we mean that kinds apply to other types of definitions, we should emphasize that. I found the adjective language a little confusing so I removed it. In Dagster, Kinds are a way to label and categorize assets within your data pipeline. These descriptions of your assets can appear within the UI as icons or be used to search and filter within the data cataloging features of Dagster. Kinds serve as descriptive labels that help you and your team quickly identify the nature or purpose of an asset. They're particularly useful in visual representations of your pipeline, where they correspond to prominently displayed icons. Dagster uses a mapping you can find at XXX to map the name of the kind to an icon in the UI. We recommend using kinds as a means to label assets that can help you answer questions you may ask, such as what external system is involved with materializing a particular asset. For example, you may label assets with the
Within Dagster, Kinds are implemented as system-defined tags with a special prefix: This implementation allows Dagster's user interfaces to treat these tags in unique ways, such as displaying or hiding them as appropriate to enhance the user experience. You can view how Dagster uses kinds internally by searching For instructions on how to implement kinds, see the 'Guide on implementing kinds' |
We will deprecate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm supportive of the kinds
and kinds are system-defined tags, prefixed with the "dagster/kind/" prefix
direction!
However, I think this particular doc page would need more polish or IA rewrite to become "jobs to be done" and more complete -- we need to map the use cases to different concepts and their usage; for example, "categorize assets visually" -> "kinds", "annotate assets with arbitrary info for better filtering and searching" -> "tags", etc.
Re: @PedramNavid 's comment
Back to the topic at hand! In here, we say Kinds label and categorize definitions. Do they categorize anything other than assets?
If the main goal of kinds
is to visually categorize and enrich your definitions, I think for now, it would apply to assets and ops (the ones that show up in visual graphs). While this is out of scope for this discussion, it'd be interesting to explore whether we can expand "kinds" to other definitions and how we'll explain that, as a follow up.
|
||
Kinds label and categorize definitions in Dagster. Notably, kinds correspond to prominently displayed icons in our visual tools (see XXX for supported visual kinds). An effective proxy for whether something should be a kind is whether or not it is an adjective in your day-to-day language and meaningfully identifies that definition. E.g. If "Is this a dbt asset or a databricks asset?" is a question in your team that would indicate that "dbt" and "spark" are good kinds for your team. | ||
|
||
In its implementation, kinds are system-defined tags, prefixed with the "dagster/kind/" prefix. As is true with all of our system tags, our UIs reserve the right to treat these tags specially, hiding them or promoting them as appropriate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In its implementation, kinds are system-defined tags, prefixed with the "dagster/kind/" prefix.
👍 this is great! I liked the explicitness here and it's also good that we can gradually disclose the relation between kinds and tags.
My explicit goal was to minimally add kinds to this page rather than to completely revamp. |
My high-level take here:
Overlapping concepts In order to decide whether to use tags or kinds, this proposal asks users to determine whether they're "labeling and categorizing" vs "annotating and organizing". I think this is a pretty difficult question to answer, given how similar these activities are. As some evidence of the similarity between “tags” and “labeling”, Dagster tags are modeled off of Kubernetes “labels”, and Datahub describes their tags as a kind of label. OpenMetadata has TagLabel. I don’t think the fact that kinds are implemented on top of tags addresses this issue, because users still need to make a decision about which parameter to use and which concepts to interact with in the UI. Relationship between kinds and tags particular Do we have a definitive answer to the question “Is a kind a tag?” Mechanically:
A couple final thoughts
|
@@ -40,6 +41,14 @@ How metadata is defined depends on whether you're using assets or ops and jobs: | |||
|
|||
--- | |||
|
|||
## How kinds work | |||
|
|||
Kinds label and categorize definitions in Dagster. Notably, kinds correspond to prominently displayed icons in our visual tools (see XXX for supported visual kinds). An effective proxy for whether something should be a kind is whether or not it is an adjective in your day-to-day language and meaningfully identifies that definition. E.g. If "Is this a dbt asset or a databricks asset?" is a question in your team that would indicate that "dbt" and "spark" are good kinds for your team. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this heuristic, I can imagine people using the following sentences:
- "Is this a silver-quality asset or a gold-quality asset?"
- "Is this a raw asset or a staging asset?"
- "Is this a Team A asset or a Team B asset?"
I worry that encouraging users to construct a flat list of:
@asset(kinds=["silver", "raw", "team_a", "s3", "snowflake"])
def foo(): ...
is error-prone (what if you use a tool called "silver", but also want to distinguish between "silver quality" and "gold quality", for example), and leaves out potentially-useful information for someone coming across this definition in the future.
To reiterate and simplify some comments I had on the original RFC, my overall opinion is that:
- People are going to want / need to categorize their information in a more sophisticated way than a flat list, and we should encourage that (i.e. if you put in the work to map your kinds to categorical information, your searching experience will improve, as will your UI). If we use
kinds
, then there is no convenient way for you to add that information. - We should not be overally-opinionated in what sort of categorical groupings we allow, and in the long run users should have complete control over the mapping from tags to visual treatment in the UI (i.e. we should allow users to make any grouping they want, not just compute_kind / storage_kind). I consider "uncategorized" to be a valid category (sometimes it's just not worth the effort to think of some ontology that a particular tag might fit into -- the flat list really is just easier), but again we shouldn't prevent people from being more explicit when they want to be.
- Elevating a flat list of
kinds
to a top-level concept and directly correlating it with UI treatment will make it significantly more difficult to introduce a more flexible concept in the future
I don't have strong opposition to generalizing kinds, but I do still worry that introducing a third top-level descriptive concept for assets is going to cause confusion. The current binary between tags (used for categorization and filtering in the UI) and metadata (used for freeform information about an asset's properties) is fairly easy to navigate. The nouns It is going to be difficult to provide meaningful recommendations about which cases to use a kind or a tag. Evaluating the heuristic of
different practitioners and teams coexisting in the same catalog will make this determination differently. I worry that the lack of guidance and lack of guardrails will result in kinds being underutilized or inconsistently utilized. Some teams may opt to embed quality information in tags and others in kinds, some teams might attach their team name as a kind. |
FYI I'm posting a final follow up here: |
Summary & Motivation
Per request as prerequiste to generalizing and pluralizing
kinds
in our APIs, I've written language to extend our "Metadata and tags" page to be "metadata, kinds, and tags".The relevant explanation:
"Kinds label and categorize definitions in Dagster."
"Tags to annotate and organize definitions in your Dagster project."
General idea is that "label" and "categorize" are more specific and stronger forms of "annotate" and "organize".
It is also transparent that kinds are implemented as system tags.
How I Tested These Changes
Read https://kinds-docs.dagster.dagster-docs.io/concepts/metadata-tags