Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add methods to log the results of the api runner to a server #177

Merged
merged 10 commits into from
Jun 24, 2024

Conversation

rishsriv
Copy link
Member

@rishsriv rishsriv commented Jun 22, 2024

The main idea behind this is to help us use the eval visualizer with just a run name, instead of having the JSON files locally. One more PR in the sql-eval repo coming up!

This will justwork™ with all the run_checkpoints*.sh scripts without any code changes. All you have to do is to add the environment variable SQL_EVAL_UPLOAD_URL that corresponds to our record-eval function (not including full URL here since this is a public repo). And you'll be good to go!

I have already added this as an env variable on the a10, gpu-inference, and h100 production instance.

@@ -33,6 +33,7 @@
parser.add_argument("-v", "--verbose", action="store_true")
parser.add_argument("-l", "--logprobs", action="store_true")
parser.add_argument("--upload_url", type=str)
parser.add_argument("--run_name", type=str, required=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a param to optionally add a run name when we are running an eval

this run name is then stored as a static JSON object, and can be used by the eval visualizer

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! nit: shall we update the README with the new option as well? 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you for pointing this out!

Copy link
Collaborator

@wongjingping wongjingping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning up and reviving the logic to log to a gcs bucket! 1 small suggestion for making the various option paths more robust 👌🏼

Just checking, where do we read the SQL_EVAL_UPLOAD_URL environment variable (which you mentioned in the description)? Or are we just using the args.upload_url now?

Separately, no action needed but just thought it would also be nice to have the option to do blob.upload_from_string(json.dumps(results)) within the sql-eval process / local environment directly since it's also more secure to just authenticate locally vs having an public http endpoint that can write to one's gcs bucket.

@@ -33,6 +33,7 @@
parser.add_argument("-v", "--verbose", action="store_true")
parser.add_argument("-l", "--logprobs", action="store_true")
parser.add_argument("--upload_url", type=str)
parser.add_argument("--run_name", type=str, required=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! nit: shall we update the README with the new option as well? 😄

@@ -0,0 +1,19 @@
# this is a Google cloud function for receiving the data from the web app and storing it in the database
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shall we add the command for launching this cloud function (in case we need to modify it and relaunch in the future)?


if args.upload_url is not None:
upload_results(
results=results,
Copy link
Collaborator

@wongjingping wongjingping Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like results is only created in L289 when logprobs is true:

results = output_df.to_dict("records")

And I think this might fail if logprobs is True and upload_url is provided (eg if the user is not exporting logprobs, say when using TGI or some other inference API), since results hasn't yet been defined in that code path.

Shall we update the code before the check in L285 to output results regardless of whether logprobs is provided?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, done!

Copy link
Collaborator

@wongjingping wongjingping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

@rishsriv
Copy link
Member Author

Thanks for the details comments!

Separately, no action needed but just thought it would also be nice to have the option to do blob.upload_from_string(json.dumps(results)) within the sql-eval process / local environment directly since it's also more secure to just authenticate locally vs having an public http endpoint that can write to one's gcs bucket.

Good point. This is a bit of a tradeoff. I wouldn't feel comfortable uploading secure credentials to some GPU providers (like, say, community instances). In the future, we could optionally add some kind of token authentication to the public logging URL if needed! That should solve for both security and the ability to not upload credentials!

@rishsriv rishsriv merged commit 49249fd into main Jun 24, 2024
2 checks passed
@rishsriv rishsriv deleted the rishabh/log-to-server branch June 24, 2024 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants