Skip to content

lamalab-org/matextract-book

Repository files navigation

From Text to Insight: Large Language Models for Materials Science Data Extraction

Test and build Contributor Covenant

MatExtract

MatExtract is a guide to structured (materials) data extraction using LLMs. Read it on matextract.pub.

For more details, see our review article.

Installation

To install the package and its dependencies:

pip install -e .

For development, we recommend using a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows use: .venv\Scripts\activate
pip install -e .

Citation

If you find our work useful, please cite our review article:

@article{Schilling_Wilhelmi_2025,
  title={From text to insight: large language models for chemical data extraction},
  ISSN={1460-4744},
  url={http://dx.doi.org/10.1039/D4CS00913D},
  DOI={10.1039/d4cs00913d},
  journal={Chemical Society Reviews},
  publisher={Royal Society of Chemistry (RSC)},
  author={Schilling-Wilhelmi, Mara and Ríos-García, Martiño and Shabih, Sherjeel and Gil, María Victoria and Miret, Santiago and Koch, Christoph T. and Márquez, José A. and Jablonka, Kevin Maik},
  year={2025}
}