Stars
Analyses of software mentions and dependencies
Source of the article "Mining experimental data from Materials Science literature with Large Language Models: an evaluation study"
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
Indexes metadata from Crossref into Elasticsearch. Primarily to be used with Biblio-Glutton
PhD Dissertation "Automated Extraction and Curation of Materials Information from Scientific Literature"
Opensource IDE For Exploring and Testing Api's (lightweight alternative to postman/insomnia)
Convert PDF to markdown + JSON quickly with high accuracy
Viewer for the structure extracted by Grobid on PDF documents
library supporting NLP and CV research on scientific papers
A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Implementation of Nougat Neural Optical Understanding for Academic Documents
Python tools for processing the stackexchange data dumps into a text dataset for Language Models
One downloader for many scientific data and code repositories! DOI 👐 Data
The official codes for "PMC-LLaMA: Towards Building Open-source Language Models for Medicine"
Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)
This repository collects 100 papers related to negative sampling methods.
A JavaScript library for text annotation
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports comp…
Get answers to research questions from 200M+ papers. Link to demo -
Easily compute clip embeddings and build a clip retrieval system with them
Unstructured data extract platform based on LlamaIndex, Pgvector, React and Django.