Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3446] [Feature] support comment in Column class #9198

Closed
3 tasks done
yassun7010 opened this issue Dec 4, 2023 · 2 comments
Closed
3 tasks done

[CT-3446] [Feature] support comment in Column class #9198

yassun7010 opened this issue Dec 4, 2023 · 2 comments
Labels
enhancement New feature or request wontfix Not a bug or out of scope for dbt-core

Comments

@yassun7010
Copy link

yassun7010 commented Dec 4, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

I am in the process of migrating an existing project for my company to dbt, and am using dbt-osmosis to automatically generate souce settings in schema.yml.

However, the generated source column description is blank and does not reference the comments in the table definitions in my database.

After checking the source code, I found that this is due to the fact that the Column class does not hold comment information.

I think it would be useful to optionally add comment information to Column to make the transition to dbt easier.

I have investigated dbt-snowflake, dbt-postgres, and dbt-bigquery and believe that this addition will not break compatibility (those adapters do not use column comment information).

Please refer to the PR of the proposal I submitted to dbt-osmosis regarding what I would like to do.

I am willing to commit to this.

Please let me know if there are any misconceptions about dbt.

Describe alternatives you've considered

Manually update and maintain the sources of schema.yml.

Who will this benefit?

People considering transitioning to dbt.

Especially for those who initially define data in sources and gradually migrate to models.

Are you interested in contributing this feature?

Yes

Anything else?

No response

@yassun7010 yassun7010 added enhancement New feature or request triage labels Dec 4, 2023
@github-actions github-actions bot changed the title [Feature] support comment in Column class [CT-3446] [Feature] support comment in Column class Dec 4, 2023
@yassun7010
Copy link
Author

I initially think this is enough for an Issue, should we move on to a discussion?

@dbeatty10
Copy link
Contributor

dbeatty10 commented Jan 4, 2024

Thanks for opening this @yassun7010 !

Goal

It sounds like your goal is:

  • given a pre-existing database with column-level comments, create a sources.yaml file with the column comments in the description field

Potential solution

There are multiple ways for how to accomplish this goal, and here are a handful of them:

  1. dbt-osmosis using the --catalog-file CLI argument
  2. utilize programmatic invocations
  3. dbt-core #9198 (this one!) + dbt-osmosis
  4. dbt-codegen #119

Our guidance

After discussion with @graciegoheen and @jtcohen6, solving this isn't a priority for us at this time. We think there's enough capabilities in dbt-core for third-party tools like dbt-osmosis or others to accomplish this goal.

Namely, the Catalog artifact contains column comment information that can be utilized by third-party tools. So we're choosing not to add a comment or description property to the Column class.

As a result, we're going to close this issue and the associated pull requests as "not planned". See below for details on using dbt-osmosis or programmatic invocations.

Using dbt-osmosis with the --catalog-file CLI argument

Your changes in z3z1ma/dbt-osmosis#120 (this line specifically) will enable dbt-osmosis to utilize the column comment information.

So upon the next release of dbt-osmosis, you should be able to do something like this:

dbt docs generate
dbt-osmosis yaml document --catalog-file target/catalog.json

Note

The above command assumes you've added something like this to your dbt_project.yml:

models:
  my_project:
    +dbt-osmosis: "_{model}.yml"

And you've added all the desired table names to a models/_sources.yml file already (without the column names):

version: 2

sources:
  - name: my_schema
    database: my_database
    schema: my_schema
    tables:
      - name: my_table_1
      - name: my_table_2
      # ...

Using programmatic invocations to inspect the Catalog artifact

Getting columns descriptions from the database for sources is possible via the Catalog artifact.

You can inspect the Catalog artifact and use its output to build the desired sources.yaml file (see more detail below).

One way to access it is via programmatic invocations with a Python script like the following:

from dbt.cli.main import dbtRunner, dbtRunnerResult, CatalogArtifact
from pprint import pprint


def inspect_catalog_nodes(nodes):

    # inspect the catalog nodes
    for node_key, node in nodes.items():
        print(node.metadata.database)
        print(node.metadata.schema)
        print(node.metadata.name)
        print(node.metadata.type)
        print(node.metadata.comment)

        # inspect the columns
        for column_key, column in node.columns.items():
            print(column.index)
            print(column.name)
            print(column.type)
            print(column.comment)


# initialize
dbt = dbtRunner()

# create CLI args as a list of strings
cli_args = ["docs", "generate"]

# run the command
res: dbtRunnerResult = dbt.invoke(cli_args)
catalog: CatalogArtifact = res.result

# inspect both types of nodes: sources and everything else
inspect_catalog_nodes(catalog.sources)
inspect_catalog_nodes(catalog.nodes)

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Jan 4, 2024
@dbeatty10 dbeatty10 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Jan 4, 2024
@dbeatty10 dbeatty10 removed their assignment Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix Not a bug or out of scope for dbt-core
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants