-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
End-to-end evaluation demo #32
Conversation
addecee
to
47baf30
Compare
9b28687
to
bfb67da
Compare
This ended up being an embarrassing quantity of iteration and written-then-deleted code for what was ultimately a pretty small PR. On the upside, I feel like I gained a better understanding of a lot of the codebase in the process. |
data/README.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, now that the static dir PR is in
data/model_group_stats.pkl
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be removed and added to static
data/token_map.pkl
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be removed and added to static
5a2c6af
to
ff64510
Compare
|
||
def load_logprob_datasets(split: str = "validation") -> dict[str, list[list[float]]]: | ||
return { | ||
model: cast(dict, load_logprob_dataset(model)[split])["logprobs"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried how much we have to use casting with Dataset objects :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the dataset library clearly wasn't designed with type-guarantees in mind. Building a library of functions that take care of the casting and other stuff we don't want to worry about in our day-to-day might be a good idea. Ticket would be something like "identify repetitive casts in the codebase and replace them with library functions".
protobuf==4.25.2 | ||
plotly==5.18.0 | ||
spacy-transformers==1.3.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, meant to reply to this but the vscode inline thingy is kind of buggy and it ended up as a separate comment. Anyway:
spacy-transformers is a dependency I needed to get the token labeling code to work that I think just got missed by accident earlier (it's a hidden dependency that only shows up at runtime).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this. Left a few comments, use your best judgement to decide what to address and what to ignore for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this?
spacy-transformers is a dependency I needed to get the token labeling code to work that I think just got missed by accident earlier (it's a hidden dependency that only shows up at runtime).
WIP