-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable evals on multiple databases for OpenAI and Anthropic generators #200
Conversation
postgres_ddl=pruned_metadata_str, | ||
to_dialect=self.db_type, | ||
db_name=self.db_name, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meaning we only do translation if there's no pruning? in gen_prompt.py the translation occurs only after the pruning/non-pruning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix, thanks for spotting!
query_generators/anthropic.py
Outdated
new_ddl, _ = ddl_to_tsql(postgres_ddl, "postgres", db_name, 42) | ||
else: | ||
raise ValueError(f"Unsupported dialect {to_dialect}") | ||
return new_ddl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you wanna add this to utils.dialect so it doesn't have to be repeated across the files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do!
postgres_ddl=table_metadata_ddl, | ||
to_dialect=self.db_type, | ||
db_name=self.db_name, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue raised above of adding translation after pruning/non-pruning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix!
Thanks for incorporating translations in the openai/anthropic runners! Just a few comments from me. |
Fixed, thanks Wendy! |
@@ -110,16 +114,43 @@ def generate_query( | |||
columns_to_keep, | |||
shuffle, | |||
) | |||
pruned_metadata_ddl = convert_postgres_ddl_to_dialect( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry forgot to highlight earlier. But i rmb adding this extra line before the translation in gen_prompt.py:
# remove triple backticks
pruned_metadata_ddl = pruned_metadata_ddl.replace("```", "").strip()
because sqlglot can't translate with those. would it also be a problem here and in the openai script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this! Hasn't been a problem - sqlite evals for OpenAI and Anthropic both worked super smoothly :D
In addition:
c=0
. These do not make any significant difference to performance (the difference in +- 0.5% with and without forgpt-4o
,gpt-4
, andclaude-3.5-sonnet
), but we are passing them through for the sake of apples to apples comparisons with other models.glossary
from all prompts, as it is not needed