The primary objective of this internship is to develop a scientific string parser that can split scientific strings into their respective metadata. This parser will replace the discontinued Freecite tool and will be used by the Flanders Marine Institute (VLIZ) to manage and search references in the WoRMS (World Register of Marine Species) database.
- Create a Local Tool: Develop a parser that can operate locally on the VLIZ servers.
- Match or Exceed Freecite Performance: Ensure the new tool performs as well or better than Freecite.
- Handle Difficult References: Develop capabilities to handle challenging or incorrectly split references.
- (Use Machine Learning: Integrate machine learning to improve parsing accuracy over time.) (tbd)
- Docker Containerization: Ensure the tool can be containerized with Docker for easy deployment.
- Scripts: contains python scripts that are used to convert XML GROBID ouput into CSV for analysis in excel
- Server scripts: contains bash scripts that are used to automate API requests on a VLIZ server