-
Notifications
You must be signed in to change notification settings - Fork 1
Get doi by scraping actual biorxiv page? #6
Comments
Ahh! This is because bioRxiv changed its URL scheme 2-3 years ago to include the DOI. Before that it was just a preprint identifier and date. I think you can convert the old scheme to the new one by using the |
If you're sure about this, this would be easy to add. Though I could see it causing other errors if there's ever some other url format. How worth it do think it is. If those are old urls, people probably will barely ever comment on them. Also looking at the runs, it seems to only happen like 1 out of 100 runs (note that less than half of the errors you see on that list are actually related to this bug). |
I am sure about the past. I am not sure about the future URLs. I agree that since this only occurs on old preprints this will almost never occur - so I am fine with ignoring it for now 👍 |
- remove verbose logging (maybe will fix the random errors like [this one](https://github.com/greenelab/preprint-bot/runs/2755283036?check_suite_focus=true) where the process exits for apparently no reason) - fix bug with comments that return biorxiv links with no dois in them. doesn't throw error anymore, just skips the comment. see #6 for more robust solution - clean up key logging - rename gh-actions job names
Sometimes randomly Disqus returns a biorxiv link that doesn't have the DOI in it. For example in this run,
https://www.biorxiv.org/content/early/2018/11/09/459529
is returned, but that redirects to the correct/expected linkhttps://www.biorxiv.org/content/10.1101/459529v1
that contains the complete doi.To simplify the bot code, I made it read the DOI from the url, assuming and hoping it always would contain it. If we ever want this to be more robust, we could have the bot actually fetch the HTML contents at the link and find the DOI in the document:
In the upcoming PR, this at least wont crash the bot, it will just skip the comment with the non-doi link.
The text was updated successfully, but these errors were encountered: