kermitt2

Follow

Patrice Lopez kermitt2

Follow

440 followers · 16 following

Organizations

Stars

howisonlab / software-mentions-dataset-analysis

Analyses of software mentions and dependencies

Go 5 Updated Jan 8, 2025

howisonlab / grobid_text_reference_conversion

Python 4 Updated Oct 16, 2024

andythean / tympi_news

Tympi News web app

JavaScript 2 Updated Nov 6, 2024

lfoppiano / mining-llm-evaluation-paper

Source of the article "Mining experimental data from Materials Science literature with Large Language Models: an evaluation study"

TeX 5 Updated Aug 15, 2024

advimman / lama

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Jupyter Notebook 8,300 881 Updated Jul 26, 2024

karatekaneen / crossrefindexer

Indexes metadata from Crossref into Elasticsearch. Primarily to be used with Biblio-Glutton

Go 3 Updated May 30, 2023

lfoppiano / PhD-Thesis

PhD Dissertation "Automated Extraction and Curation of Materials Information from Scientific Literature"

TeX 8 Updated Feb 20, 2024

usebruno / bruno

Opensource IDE For Exploring and Testing Api's (lightweight alternative to postman/insomnia)

JavaScript 29,581 1,387 Updated Jan 11, 2025

VikParuchuri / marker

Convert PDF to markdown + JSON quickly with high accuracy

Python 19,170 1,137 Updated Jan 10, 2025

lfoppiano / structure-vision

Viewer for the structure extracted by Grobid on PDF documents

Python 43 8 Updated Jan 11, 2025

lfoppiano / streamlit-pdf-viewer

Streamlit PDF viewer

Python 117 9 Updated Jan 11, 2025

allenai / papermage

library supporting NLP and CV research on scientific papers

Python 724 57 Updated Nov 8, 2024

lfoppiano / document-qa

Scientific Document Insight Q/A

Python 26 5 Updated Nov 21, 2024

malteos / llm-datasets

A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.

Python 56 5 Updated Jul 29, 2024

clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Python 5,955 480 Updated Jul 11, 2024

facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 9,140 585 Updated Apr 16, 2024

EleutherAI / stackexchange-dataset

Python tools for processing the stackexchange data dumps into a text dataset for Language Models

Python 80 15 Updated Dec 6, 2023

howisonlab / screenit-softcite

Python 2 2 Updated Aug 4, 2023

asreview / asreview

Active learning for systematic reviews

Python 662 123 Updated Jan 11, 2025

J535D165 / datahugger

One downloader for many scientific data and code repositories! DOI 👐 Data

Python 66 10 Updated Jan 6, 2025

chaoyi-wu / PMC-LLaMA

The official codes for "PMC-LLaMA: Towards Building Open-source Language Models for Medicine"

Python 621 54 Updated Jul 8, 2024

mgieseki / dvisvgm

A fast DVI, EPS, and PDF to SVG converter

C++ 316 34 Updated Jan 11, 2025

karthik / csvconf2023

Slides and resources from my CSV Conf 2023 keynote

17 Updated May 1, 2023

malteos / scincl

Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)

Python 66 1 Updated Nov 11, 2022

RUCAIBox / Negative-Sampling-Paper

This repository collects 100 papers related to negative sampling methods.

190 19 Updated Jun 25, 2023

recogito / recogito-js

A JavaScript library for text annotation

JavaScript 373 42 Updated Mar 28, 2024

catboost / catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports comp…

Python 8,188 1,199 Updated Jan 12, 2025

shauryr / S2QA

Get answers to research questions from 200M+ papers. Link to demo -

Jupyter Notebook 204 22 Updated Dec 28, 2023

rom1504 / clip-retrieval

Easily compute clip embeddings and build a clip retrieval system with them

Jupyter Notebook 2,469 216 Updated Apr 15, 2024

JSv4 / OpenContracts

Unstructured data extract platform based on LlamaIndex, Pgvector, React and Django.

Python 751 67 Updated Jan 8, 2025