MatExtract is a guide to structured (materials) data extraction using LLMs. Read it on matextract.pub.
For more details, see our review article.
To install the package and its dependencies:
pip install -e .
For development, we recommend using a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows use: .venv\Scripts\activate
pip install -e .
If you find our work useful, please cite our review article:
@article{Schilling_Wilhelmi_2025,
title={From text to insight: large language models for chemical data extraction},
ISSN={1460-4744},
url={http://dx.doi.org/10.1039/D4CS00913D},
DOI={10.1039/d4cs00913d},
journal={Chemical Society Reviews},
publisher={Royal Society of Chemistry (RSC)},
author={Schilling-Wilhelmi, Mara and Ríos-García, Martiño and Shabih, Sherjeel and Gil, María Victoria and Miret, Santiago and Koch, Christoph T. and Márquez, José A. and Jablonka, Kevin Maik},
year={2025}
}