-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
GSoC 2021 Improve pdf support in JabRef
Student | Benedikt Tutzer |
---|---|
Organization | JabRef |
Primary repository | JabRef/jabref |
Project name | Improve pdf support in JabRef |
Project mentors | Oliver Kopp and Carl Christian Snethlage |
Project page | Google Summer of Code 2021 Project Page |
Status | Complete |
JabRef had only limited support to interact with pdfs. It could only read XMP metadata and open linked PDFs. Since pdfs are a common format to share scientific papers, this needed to be improved. Thanks to the features implemented by Benedikt Tutzer during Google Summer of Code 2021, JabRef users can now:
- write XMP metadata to PDFs from the command line
- extract PDF metadata
- by sending the PDF to JabRefs Grobid server
- by importing embedded BibTeX files
- by importing a verbatim BibTeX entry given on the first page of the PDF
- by merging the metadata obtained from the methods mentioned above automatically or using a merge dialogue.
- search the contents of all linked PDF documents
7814 CLI option to write XMP metadata to pdfs
This expands JabRef's CLI to allow users to write XMP metadata of selected entries in their database to linked PDFs.
2838 Search in PDF Files
Started in May 2017 by Linus Dietz, this PR implements a fulltext-search feature based on Apache Lucene. The PR was taken over by Benedikt Tutzer as Part of this GSoC project. Tasks done by Benedikt:
- Fix and update dependencies
- Redefine what fields are indexed
- Synchronization of Index with Bib-Database
- At startup:
- Add all PDF's to the index that were not indexed before
- Update all index-entries for PDF's that changed since they were indexed
- Remove all index-entries for PDF's that were removed
- During use:
- Add PDFs that are linked by the user
- Remove PDFs that are unlinked by the user
- At startup:
- Interface to search in the index
- Presentation of search results
7931 Fix broken GroupDialog
This PR fixes an issue introduced with the fulltext-search feature
7980 Fulltext Index: Only index local pdf files
This PR makes sure only local PDF files are added to the index.
7981 Improved progress indication for fulltext-index operations
This PR improves the presentation of the indexing-progress.
7989 Improve presentation of fulltext search results
This PR improves how results are presented to the users.
7947 Implement more pdf importers
This PR adds multiple importers that can be used to determine metadata from PDF files:
- PdfVerbatimBibTextImporter looks for a verbatim BibTeX entry on the first page of the pdf
- PdfEmbeddedBibFileImporter looks for an embedded BibTeX file in the pdf
- PdfGrobidMetadataImporter sends the pdf to the Web API at http://grobid.jabref.org to determine the metadata using the Deep-Learning Library Grobid
- PdfMergeMetadataImporter merges the metadata found by other importers. If identifiers were found (DOI or ISBN), metadata is fetched for the identifier as well.
7963 Remove DOI lookup from PdfContentImporter
As the PdfMergeMetadataImporter now looks-up DOI and ISBN anyhow, there is no need to do that in the individual importers any more. This PR removes the DOI lookup from the previousely existing PdfContentImporter.
7929 Implement an interface to import PDF metadata from multiple sources (XMP, Grobid, ...)
This implements an n-way merge dialog to allow the user to extract metadata from multiple sources and then select what metadata to store in the database.
8001 Reordered Pdf-Importer priorities
This PR reorders the priorities of the pdf-importers.
8002 Preferences for Grobid
This PR makes all interaction with the Grobid-Server Opt-in. This is to make sure JabRef does not send PDF's to the Web-Service without the users clear intent to do so.
8003 Refactor processCitation in GrobidService to match processPdf
Follow up that improves the UnitTests.
7797 Added auto-key-generation task to task-progress
7804 JournalAbbreviation search feature
7907 Removed references to apache commons logging
8006 [PoC] Introduced read/write interface for preferences
This is a proof-of-concept to change how passing preferences objects is handeled in JabRef.
The API of Grobid maily returns TEI for most requests. We added BibTeX support for the request we use for the metadata extraction.
800 Accept application/x-bibtex for processHeaderDocument
6469 Fix bracket collisions
6443 Implement task progress indicator (and dialog) in the toolbar
6437 Fixed entry duplication on file download
6436 Cleanup dead code
6381 Added a download checkbox to the import dialog
Total commits | 14 |
---|---|
Lines added | 3273 |
Lines removed | 505 |
(For commits made by Benedikt Tutzer during GSoC 2021 to JabRef's main branch only. Commits were squashed before counting.)
Project blogpost: July 04, 2021 – JabRef GSoC’21 Projects
- Home
- General Information
- Development
- Please go to our devdocs at https://devdocs.jabref.org
- GSOC 2025 ideas list
- Completed "Google Summer of Code" (GSoC) projects
- GSoC 2024 ‐ Improved CSL Support (and more LibreOffice‐JabRef integration enhancements)
- GSoC 2024 - Lucene Search Backend Integration
- GSoC 2024 ‐ AI‐Powered Summarization and “Interaction” with Academic Papers
- GSoC 2022 — Implement a Three Way Merge UI for merging BibTeX entries
- GSoC 2021 - Improve pdf support in JabRef
- GSoC 2021 - Microsoft Word Integration
- GSoc 2019 - Bidirectional Integration — Paper Writing — LaTeX and JabRef 5.0
- GSoC Archive
- Release
- JabCon Archive