-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-train paper ranking model with data from curated_papers.tsv
#1248
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1248 +/- ##
==========================================
+ Coverage 42.51% 46.62% +4.10%
==========================================
Files 117 118 +1
Lines 8327 8286 -41
Branches 1963 1362 -601
==========================================
+ Hits 3540 3863 +323
+ Misses 4582 4238 -344
+ Partials 205 185 -20 ☔ View full report in Codecov by Sentry. |
|
||
pmids_to_fetch = curated_df["pubmed"].tolist() | ||
fetched_metadata = {} | ||
for chunk in [pmids_to_fetch[i : i + 200] for i in range(0, len(pmids_to_fetch), 200)]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's an (more?)itertools function for chunking that will make this code more idiomatic
The latest commit introduces a unit test for the It executes the
|
This pull request implements functionality in the paper ranking model to include data from the
curated_papers.tsv
file when re-training the model and excluding any curated publications from re-appearing in thepredictions.tsv
file.