Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw an error in writer if property isn't defined #154

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

amc-corey-cox
Copy link
Contributor

This adds an error for Koza if the writer is given properties to write that aren't defined. Previously, we silently ignored those columns and just didn't write them.

@amc-corey-cox amc-corey-cox marked this pull request as draft October 29, 2024 13:38
@amc-corey-cox
Copy link
Contributor Author

I have implemented the main check. We should consider if we want to allow this feature to be turned off. I also need to write tests.

@amc-corey-cox
Copy link
Contributor Author

I took a step back and decided to abstract the initialization and writer functionality back up into the KozaWriter parent class. I think this works pretty well. Please take a look and see what you think!

@amc-corey-cox amc-corey-cox marked this pull request as ready for review November 7, 2024 21:11
@amc-corey-cox
Copy link
Contributor Author

The key files to look at are writer.py tsv_writer.py and jsonl_writer.py. Everything else is just to fix tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like mostly formatting changes? (eesh i never set up ruff or pre-commit on this repo did i)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i really like this approach! splitting write_node and write_edge out into their own functions is a good call too.

(i know i can't officially approve any PRs but i'm bored and wanted to share my thoughts :) )

@amc-corey-cox
Copy link
Contributor Author

It just occurred to me that this has the potential to break some of our ingests. Now that we are requiring all columns sent to writer to be written, if an ingest relies on the previous behavior of ignoring extra columns it will break. I have not implemented making the error optional so if we need that I can; it would certainly be easy to do.

@glass-ships
Copy link
Collaborator

my initial thought was, "well it's probably worth fixing those ingests if they do have that dependency", as it seems a bit weird to want to write columns that don't exist in the data.
but an ignore flag is probably harmless enough?

@amc-corey-cox
Copy link
Contributor Author

I just pushed another commit that better splits up and abstracts the init. I also noticed some unnecessary code I had forgotten to delete. I think this improves the previous refactor a fair amount making things even simpler and better partitioned.

I have put in hooks for ignoring the parameter checks for testing purposes but I haven't implemented them at a config level. I think we should just fix any ingests that don't have things listed properly. We can finish the implementation of ignoring the checks if we find a good use-case. Otherwise, I think it is bad practice to pass data to writer that you don't want to write and so it should remain an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Raise an error if an entity property isn't defined in node/edge properties
2 participants