-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add methods to log the results of the api runner to a server #177
Conversation
@@ -33,6 +33,7 @@ | |||
parser.add_argument("-v", "--verbose", action="store_true") | |||
parser.add_argument("-l", "--logprobs", action="store_true") | |||
parser.add_argument("--upload_url", type=str) | |||
parser.add_argument("--run_name", type=str, required=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a param to optionally add a run name when we are running an eval
this run name is then stored as a static JSON object, and can be used by the eval visualizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! nit: shall we update the README with the new option as well? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thank you for pointing this out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for cleaning up and reviving the logic to log to a gcs bucket! 1 small suggestion for making the various option paths more robust 👌🏼
Just checking, where do we read the SQL_EVAL_UPLOAD_URL
environment variable (which you mentioned in the description)? Or are we just using the args.upload_url
now?
Separately, no action needed but just thought it would also be nice to have the option to do blob.upload_from_string(json.dumps(results))
within the sql-eval process / local environment directly since it's also more secure to just authenticate locally vs having an public http endpoint that can write to one's gcs bucket.
@@ -33,6 +33,7 @@ | |||
parser.add_argument("-v", "--verbose", action="store_true") | |||
parser.add_argument("-l", "--logprobs", action="store_true") | |||
parser.add_argument("--upload_url", type=str) | |||
parser.add_argument("--run_name", type=str, required=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! nit: shall we update the README with the new option as well? 😄
@@ -0,0 +1,19 @@ | |||
# this is a Google cloud function for receiving the data from the web app and storing it in the database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shall we add the command for launching this cloud function (in case we need to modify it and relaunch in the future)?
|
||
if args.upload_url is not None: | ||
upload_results( | ||
results=results, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like results
is only created in L289 when logprobs
is true:
results = output_df.to_dict("records")
And I think this might fail if logprobs
is True and upload_url
is provided (eg if the user is not exporting logprobs, say when using TGI or some other inference API), since results
hasn't yet been defined in that code path.
Shall we update the code before the check in L285 to output results
regardless of whether logprobs
is provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point, done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates!
Thanks for the details comments!
Good point. This is a bit of a tradeoff. I wouldn't feel comfortable uploading secure credentials to some GPU providers (like, say, community instances). In the future, we could optionally add some kind of token authentication to the public logging URL if needed! That should solve for both security and the ability to not upload credentials! |
The main idea behind this is to help us use the eval visualizer with just a run name, instead of having the JSON files locally. One more PR in the sql-eval repo coming up!
This will justwork™ with all the
run_checkpoints*.sh
scripts without any code changes. All you have to do is to add the environment variableSQL_EVAL_UPLOAD_URL
that corresponds to our record-eval function (not including full URL here since this is a public repo). And you'll be good to go!I have already added this as an env variable on the
a10
,gpu-inference
, and h100 production instance.