Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split_file references output format that doesn't exist #156

Open
ptgolden opened this issue Oct 30, 2024 · 2 comments
Open

split_file references output format that doesn't exist #156

ptgolden opened this issue Oct 30, 2024 · 2 comments
Assignees

Comments

@ptgolden
Copy link
Member

In this method:

koza/src/koza/cli_utils.py

Lines 130 to 143 in fceafe5

def split_file(file: str,
fields: str,
format: OutputFormat = OutputFormat.tsv,
remove_prefixes: bool = False,
output_dir: str = "./output"):
db = duckdb.connect(":memory:")
#todo: validate that each of the fields is actually a column in the file
if format == OutputFormat.tsv:
read_file = f"read_csv('{file}')"
elif format == OutputFormat.json:
read_file = f"read_json('{file}')"
else:
raise ValueError(f"Format {format} not supported")

split_lines checks if the output format is OutputFormat.json, but no such output format exists:

class OutputFormat(str, Enum):
"""
Output formats
"""
tsv = "tsv"
jsonl = "jsonl"
kgx = "kgx"

Does duckdb have a jsonlines reader?

@ptgolden
Copy link
Member Author

It has since 0.7.0. I'll change to OutputFormat.jsonl and make sure tests pass.

@ptgolden ptgolden self-assigned this Nov 5, 2024
@kevinschaper
Copy link
Member

thanks for catching this and cleaning it up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants