Skip to content

Commit

Permalink
Merge pull request #133 from ufal/release-1.0.0
Browse files Browse the repository at this point in the history
Release 1.0.1
  • Loading branch information
kasnerz authored Nov 13, 2024
2 parents c522955 + 19c1bd9 commit 3fe1a36
Show file tree
Hide file tree
Showing 146 changed files with 6,702 additions and 5,244 deletions.
26 changes: 26 additions & 0 deletions .github/workflows/black_check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: Black

on:
push:
branches: [ main, release-1.0.0]
pull_request:
branches: [ main, release-1.0.0]

jobs:

check-black:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.11
- name: Install Black 24.10.0 - check setup.py if version matches
run: |
python -m pip install --upgrade pip
pip install black==24.10.0
- name: Run Black
run: |
black --check .
26 changes: 26 additions & 0 deletions .github/workflows/py311_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: Pytest

on:
push:
branches: [ main, release-1.0.0]
pull_request:
branches: [ main, release-1.0.0]

jobs:

build:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.11
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e .[test]
- name: Run tests
run: |
pytest
11 changes: 5 additions & 6 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
__pycache__/
*.py[cod]
data
*.egg-info
export/datasets
tmp
Expand All @@ -12,15 +11,15 @@ env/
.venv
.vscode
build
dist
venv/
wiki
factgenie/annotations
factgenie/generations
factgenie/outputs
factgenie/campaigns
factgenie/templates/campaigns/*
!factgenie/templates/campaigns/*.*
factgenie/data/datasets.yml
factgenie/data/inputs
factgenie/data/outputs
factgenie/config/config.yml
factgenie/config/datasets.yml
factgenie/config/llm-eval
factgenie/config/llm-gen
factgenie/config/crowdsourcing
2 changes: 1 addition & 1 deletion CONTRIBUTING
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

Thank you for considering contributing to **factgenie**!

Please, see the 🌱 [Contributing](../../wiki/07-Contributing) page on our wiki for details.
Please, see the 🌱 [Contributing](../../wiki/Contributing) page on our wiki for details.
3 changes: 2 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@ RUN mkdir -p /usr/src/factgenie
WORKDIR /usr/src/factgenie

COPY . /usr/src/factgenie
RUN cp /usr/src/factgenie/factgenie/config/config_TEMPLATE.yml /usr/src/factgenie/factgenie/config/config.yml

RUN pip install -e .[deploy]

EXPOSE 80
ENTRYPOINT ["gunicorn", "--env", "SCRIPT_NAME=", "-b", ":80", "-w", "1", "--threads", "2", "factgenie.cli:create_app()"]
ENTRYPOINT ["gunicorn", "--env", "SCRIPT_NAME=", "-b", ":80", "-w", "1", "--threads", "8", "factgenie.bin.run:create_app()"]
70 changes: 39 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@

<h1> factgenie </h1>

![GitHub](https://img.shields.io/github/license/kasnerz/factgenie)
![GitHub issues](https://img.shields.io/github/issues/kasnerz/factgenie)
[![arXiv](https://img.shields.io/badge/arXiv-2407.17863-0175ac.svg)](https://arxiv.org/abs/2407.17863)
![Github downloads](https://img.shields.io/github/downloads/kasnerz/factgenie/total)
![PyPI](https://img.shields.io/pypi/v/factgenie)
[![slack](https://img.shields.io/badge/slack-factgenie-0476ad.svg?logo=slack)](https://join.slack.com/t/factgenie/shared_invite/zt-2u180yy81-3zCR7mt8EOy55cxA5zhKyQ)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
![Github stars](https://img.shields.io/github/stars/kasnerz/factgenie?style=social)
<!-- ![PyPI](https://img.shields.io/pypi/v/factgenie) -->
<!-- [![arXiv](https://img.shields.io/badge/arXiv-2407.17863-b31b1b.svg)](https://arxiv.org/abs/2407.17863) -->
<!-- ![PyPI downloads](https://img.shields.io/pypi/dm/factgenie) -->

Annotate LLM outputs with a lightweight, self-hosted web application 🌈
Expand All @@ -17,14 +17,8 @@ Annotate LLM outputs with a lightweight, self-hosted web application 🌈

</div>

## 📢 News
- **25/10/2024** — We are preparing the first official release. Stay tuned!
- **08/10/2024** — We added [step-by-step walkthrougs](../../wiki/00-Tutorials) on using factgenie for generating and annotating outputs for a dataset of basketball reports 🏀
- **07/10/2024** — We removed the example datasets from the repository. Instead, you can find them in the _External Resources_ section in the _Manage data_ interface.
- **24/09/2024** — We introduced a brand new factgenie logo!
- **19/09/2024** — On the Analytics page, you can now see detailed statistics about annotations and compute inter-annotator agreement 📈
- **16/09/2024** — You can now collect extra inputs from the annotators for each example using sliders and select boxes.
- **16/09/2024** — We added an option to generate outputs for the inputs with LLMs directly within factgenie! 🦾
## 📢 Changelog
- **[1.0.0] - 2024-11-13**: The first official release 🎉

## 👉️ How can factgenie help you?
Outputs from large language models (LLMs) may contain errors: semantic, factual, and lexical.
Expand All @@ -42,39 +36,53 @@ Factgenie can provide you:
*What does factgenie **not help with** is collecting the data (we assume that you already have these), starting the crowdsourcing campaign (for that, you need to use a service such as [Prolific.com](https://prolific.com)) or running the LLM evaluators (for that, you need a local framework such as [Ollama](https://ollama.com) or a proprietary API).*

## 🏃 Quickstart
Make sure you have Python 3 installed (the project is tested with Python 3.10).
Make sure you have Python >=3.9 installed.

After cloning the repository, the following commands install the package and start the web server:
If you want to quickly try out factgenie, you can install the package from PyPI:
```bash
pip install factgenie
```

However, the recommended approach for using factgenie is using an editable package:
```bash
git clone https://github.com/ufal/factgenie.git
cd factgenie
pip install -e .[dev,deploy]
factgenie run --host=127.0.0.1 --port 5000
```
This approach will allow you to manually modify configuration files, write your own data classes and access generated files.

After installing factgenie, use the following command to run the server on your local computer:
```bash
factgenie run --host=127.0.0.1 --port 8890
```
More information on how to set up factgenie is on [Github wiki](../../wiki/Setup).

## 💡 Usage guide


See the following **wiki pages** that that will guide you through various use-cases of factgenie:

| Topic | Description |
| ---------------------------------------------------------------------- | -------------------------------------------------- |
| 🔧 [Setup](../../wiki/01-Setup) | How to install factgenie. |
| 🗂️ [Data Management](../../wiki/02-Data-Management) | How to manage datasets and model outputs. |
| 🤖 [LLM Annotations](../../wiki/03-LLM-Annotations) | How to annotate outputs using LLMs. |
| 👥 [Crowdsourcing Annotations](../../wiki/04-Crowdsourcing-Annotations) | How to annotate outputs using human crowdworkers. |
| ✍️ [Generating Outputs](../../wiki/05-Generating-Outputs) | How to generate outputs using LLMs. |
| 📊 [Analyzing Annotations](../../wiki/06-Analyzing-Annotations) | How to obtain statistics on collected annotations. |
| 🌱 [Contributing](../../wiki/07-Contributing) | How to contribute to factgenie. |
| Topic | Description |
| ------------------------------------------------------------------- | -------------------------------------------------- |
| 🔧 [Setup](../../wiki/Setup) | How to install factgenie. |
| 🗂️ [Data Management](../../wiki/Data-Management) | How to manage datasets and model outputs. |
| 🤖 [LLM Annotations](../../wiki/LLM-Annotations) | How to annotate outputs using LLMs. |
| 👥 [Crowdsourcing Annotations](../../wiki/Crowdsourcing-Annotations) | How to annotate outputs using human crowdworkers. |
| ✍️ [Generating Outputs](../../wiki/Generating-Outputs) | How to generate outputs using LLMs. |
| 📊 [Analyzing Annotations](../../wiki/Analyzing-Annotations) | How to obtain statistics on collected annotations. |
| 💻 [Command Line Interface](../../wiki/CLI) | How to use factgenie command line interface. |
| 🌱 [Contributing](../../wiki/Contributing) | How to contribute to factgenie. |

## 🔥 Tutorials
We also provide step-by-step walkthroughs showing how to employ factgenie on the [the dataset from the Shared Task in Evaluating Semantic Accuracy](https://github.com/ehudreiter/accuracySharedTask):

| Tutorial | Description |
| ------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------ |
| [🏀 #1: Importing a custom dataset](../../wiki/00-Tutorials#-tutorial-1-importing-a-custom-dataset) | Loading the basketball statistics and model-generated basketball reports into the web interface. |
| [💬 #2: Generating outputs](../../wiki/00-Tutorials#-tutorial-2-generating-outputs) | Using Llama 3.1 with Ollama for generating basketball reports. |
| [📊 #3: Customizing data visualization](../../wiki/00-Tutorials#-tutorial-3-customizing-data-visualization) | Manually creating a custom dataset class for better data visualization. |
| [🤖 #4: Annotating outputs with an LLM](../../wiki/00-Tutorials#-tutorial-4-annotating-outputs-with-an-llm) | Using GPT-4o for annotating errors in the basketball reports. |
| [👨‍💼 #5: Annotating outputs with human annotators](../../wiki/00-Tutorials#-tutorial-5-annotating-outputs-with-human-annotators) | Using human annotators for annotating errors in the basketball reports. |
| Tutorial | Description |
| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------ |
| [🏀 #1: Importing a custom dataset](../../wiki/Tutorials#-tutorial-1-importing-a-custom-dataset) | Loading the basketball statistics and model-generated basketball reports into the web interface. |
| [💬 #2: Generating outputs](../../wiki/Tutorials#-tutorial-2-generating-outputs) | Using Llama 3.1 with Ollama for generating basketball reports. |
| [📊 #3: Customizing data visualization](../../wiki/Tutorials#-tutorial-3-customizing-data-visualization) | Manually creating a custom dataset class for better data visualization. |
| [🤖 #4: Annotating outputs with an LLM](../../wiki/Tutorials#-tutorial-4-annotating-outputs-with-an-llm) | Using GPT-4o for annotating errors in the basketball reports. |
| [👨‍💼 #5: Annotating outputs with human annotators](../../wiki/Tutorials#-tutorial-5-annotating-outputs-with-human-annotators) | Using human annotators for annotating errors in the basketball reports. |


## 💬 Cite us
Expand Down
25 changes: 23 additions & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,29 @@
# YOU NEED run once `curl http://localhost:11434/api/pull -d '{"name": "llama3.1:8b"}'`
# after running `docker-compose up -d` from the repo root directory
# in order to download the llama3.1:8b model which is the default model
# we use in the example configurations for factgenie
services:
factgenie:
container_name: factgenie
image: factgenie
restart: on-failure
ports:
- 8080:80
build: ./factgenie
- 8890:80
build: ./

# Factgenie connects to LLM inference servers either OpenAI client or Ollama
# Demonstrates running ollama on CPU
# For GPU run ollama without Docker
# or look at https://hub.docker.com/r/ollama/ollama and follow the GPU instructions
ollama:
container_name: ollama
image: ollama/ollama
restart: on-failure
# We need to expose the port to your machine because you need to pull models for ollama
# before factgenie queries the ollama server to run inference for the model.
# E.g. curl http://localhost:11434/api/pull -d '{"name": "llama3.1:8b"}' to download the factgenie default LLM.
ports:
- 11434:11434



22 changes: 7 additions & 15 deletions factgenie/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,21 @@

PACKAGE_DIR = Path(__file__).parent
ROOT_DIR = PACKAGE_DIR.parent

TEMPLATES_DIR = PACKAGE_DIR / "templates"
STATIC_DIR = PACKAGE_DIR / "static"
ANNOTATIONS_DIR = PACKAGE_DIR / "annotations"
GENERATIONS_DIR = PACKAGE_DIR / "generations"
CAMPAIGN_DIR = PACKAGE_DIR / "campaigns"
LLM_EVAL_CONFIG_DIR = PACKAGE_DIR / "config" / "llm-eval"
LLM_GEN_CONFIG_DIR = PACKAGE_DIR / "config" / "llm-gen"
CROWDSOURCING_CONFIG_DIR = PACKAGE_DIR / "config" / "crowdsourcing"

DATA_DIR = PACKAGE_DIR / "data"
OUTPUT_DIR = PACKAGE_DIR / "outputs"
INPUT_DIR = PACKAGE_DIR / "data" / "inputs"
OUTPUT_DIR = PACKAGE_DIR / "data" / "outputs"

DATASET_CONFIG_PATH = PACKAGE_DIR / "data" / "datasets.yml"
RESOURCES_CONFIG_PATH = PACKAGE_DIR / "config" / "resources.yml"
DATASET_CONFIG_PATH = PACKAGE_DIR / "config" / "datasets.yml"

OLD_DATASET_CONFIG_PATH = PACKAGE_DIR / "loaders" / "datasets.yml"
OLD_MAIN_CONFIG_PATH = PACKAGE_DIR / "config.yml"

MAIN_CONFIG_PATH = PACKAGE_DIR / "config" / "config.yml"
if not MAIN_CONFIG_PATH.exists() and not OLD_MAIN_CONFIG_PATH.exists():
raise ValueError(
f"Invalid path to config.yml {MAIN_CONFIG_PATH=}. "
"Please copy config_TEMPLATE.yml to config.yml "
"and change the password, update the host prefix, etc."
)

MAIN_CONFIG_TEMPLATE_PATH = PACKAGE_DIR / "config" / "config_TEMPLATE.yml"
DEFAULT_PROMPTS_CONFIG_PATH = PACKAGE_DIR / "config" / "default_prompts.yml"
PREVIEW_STUDY_ID = "factgenie_preview"
Loading

0 comments on commit 3fe1a36

Please sign in to comment.