Merge pull request #133 from ufal/release-1.0.0

Release 1.0.1
ufal · Nov 13, 2024 · 3fe1a36 · 3fe1a36
2 parents c522955 + 19c1bd9
commit 3fe1a36
Show file tree

Hide file tree

Showing 146 changed files with 6,702 additions and 5,244 deletions.
diff --git a/.github/workflows/black_check.yml b/.github/workflows/black_check.yml
@@ -0,0 +1,26 @@
+name: Black
+
+on:
+  push:
+    branches: [ main, release-1.0.0]
+  pull_request:
+    branches: [ main, release-1.0.0]
+
+jobs:
+
+  check-black:
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v2
+    - name: Set up Python
+      uses: actions/setup-python@v2
+      with:
+        python-version: 3.11
+    - name: Install Black 24.10.0 - check setup.py if version matches
+      run: |
+        python -m pip install --upgrade pip
+        pip install black==24.10.0
+    - name: Run Black
+      run: |
+        black --check .
diff --git a/.github/workflows/py311_tests.yml b/.github/workflows/py311_tests.yml
@@ -0,0 +1,26 @@
+name: Pytest
+
+on:
+  push:
+    branches: [ main, release-1.0.0]
+  pull_request:
+    branches: [ main, release-1.0.0]
+
+jobs:
+
+  build:
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v2
+    - name: Set up Python
+      uses: actions/setup-python@v2
+      with:
+        python-version: 3.11
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install -e .[test]
+    - name: Run tests
+      run: |
+        pytest
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,5 @@
 __pycache__/
 *.py[cod]
-data
 *.egg-info
 export/datasets
 tmp
@@ -12,15 +11,15 @@ env/
 .venv
 .vscode
 build
+dist
 venv/
 wiki
-factgenie/annotations
-factgenie/generations
-factgenie/outputs
+factgenie/campaigns
 factgenie/templates/campaigns/*
-!factgenie/templates/campaigns/*.*
+factgenie/data/datasets.yml
+factgenie/data/inputs
+factgenie/data/outputs
 factgenie/config/config.yml
-factgenie/config/datasets.yml
 factgenie/config/llm-eval
 factgenie/config/llm-gen
 factgenie/config/crowdsourcing
diff --git a/CONTRIBUTING b/CONTRIBUTING
@@ -2,4 +2,4 @@
 
 Thank you for considering contributing to **factgenie**! 
 
-Please, see the 🌱 [Contributing](../../wiki/07-Contributing) page on our wiki for details.
+Please, see the 🌱 [Contributing](../../wiki/Contributing) page on our wiki for details.
diff --git a/Dockerfile b/Dockerfile
@@ -4,8 +4,9 @@ RUN mkdir -p /usr/src/factgenie
 WORKDIR /usr/src/factgenie
 
 COPY . /usr/src/factgenie
+RUN cp /usr/src/factgenie/factgenie/config/config_TEMPLATE.yml /usr/src/factgenie/factgenie/config/config.yml
 
 RUN pip install -e .[deploy]
 
 EXPOSE 80
-ENTRYPOINT ["gunicorn", "--env", "SCRIPT_NAME=", "-b", ":80", "-w", "1", "--threads", "2", "factgenie.cli:create_app()"]
+ENTRYPOINT ["gunicorn", "--env", "SCRIPT_NAME=", "-b", ":80", "-w", "1", "--threads", "8", "factgenie.bin.run:create_app()"]
diff --git a/README.md b/README.md
@@ -3,12 +3,12 @@
 
 <h1> factgenie </h1>
 
-![GitHub](https://img.shields.io/github/license/kasnerz/factgenie)
-![GitHub issues](https://img.shields.io/github/issues/kasnerz/factgenie)
-[![arXiv](https://img.shields.io/badge/arXiv-2407.17863-0175ac.svg)](https://arxiv.org/abs/2407.17863)
+![Github downloads](https://img.shields.io/github/downloads/kasnerz/factgenie/total)
+![PyPI](https://img.shields.io/pypi/v/factgenie)
+[![slack](https://img.shields.io/badge/slack-factgenie-0476ad.svg?logo=slack)](https://join.slack.com/t/factgenie/shared_invite/zt-2u180yy81-3zCR7mt8EOy55cxA5zhKyQ)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 ![Github stars](https://img.shields.io/github/stars/kasnerz/factgenie?style=social)
-<!-- ![PyPI](https://img.shields.io/pypi/v/factgenie) -->
+<!-- [![arXiv](https://img.shields.io/badge/arXiv-2407.17863-b31b1b.svg)](https://arxiv.org/abs/2407.17863) -->
 <!-- ![PyPI downloads](https://img.shields.io/pypi/dm/factgenie) -->
 
 Annotate LLM outputs with a lightweight, self-hosted web application 🌈
@@ -17,14 +17,8 @@ Annotate LLM outputs with a lightweight, self-hosted web application 🌈
 
 </div>
 
-## 📢  News
-- **25/10/2024** — We are preparing the first official release. Stay tuned!
-- **08/10/2024** — We added  [step-by-step walkthrougs](../../wiki/00-Tutorials) on using factgenie for generating and annotating outputs for a dataset of basketball reports 🏀
-- **07/10/2024** — We removed the example datasets from the repository. Instead, you can find them in the _External Resources_ section in the _Manage data_ interface.
-- **24/09/2024** — We introduced a brand new factgenie logo!
-- **19/09/2024** — On the Analytics page, you can now see detailed statistics about annotations and compute inter-annotator agreement 📈
-- **16/09/2024** — You can now collect extra inputs from the annotators for each example using sliders and select boxes. 
-- **16/09/2024** — We added an option to generate outputs for the inputs with LLMs directly within factgenie! 🦾
+## 📢  Changelog
+- **[1.0.0] - 2024-11-13**: The first official release 🎉
 
 ## 👉️ How can factgenie help you?
 Outputs from large language models (LLMs) may contain errors: semantic, factual, and lexical. 
@@ -42,39 +36,53 @@ Factgenie can provide you:
 *What does factgenie **not help with** is collecting the data (we assume that you already have these), starting the crowdsourcing campaign (for that, you need to use a service such as [Prolific.com](https://prolific.com)) or running the LLM evaluators (for that, you need a local framework such as [Ollama](https://ollama.com) or a proprietary API).*
 
 ## 🏃 Quickstart
-Make sure you have Python 3 installed (the project is tested with Python 3.10).
+Make sure you have Python >=3.9 installed.
 
-After cloning the repository, the following commands install the package and start the web server:
+If you want to quickly try out factgenie, you can install the package from PyPI:
+```bash
+pip install factgenie
 ```
+
+However, the recommended approach for using factgenie is using an editable package:
+```bash
+git clone https://github.com/ufal/factgenie.git
+cd factgenie
 pip install -e .[dev,deploy]
-factgenie run --host=127.0.0.1 --port 5000
 ```
+This approach will allow you to manually modify configuration files, write your own data classes and access generated files.
+
+After installing factgenie, use the following command to run the server on your local computer:
+```bash
+factgenie run --host=127.0.0.1 --port 8890
+```
+More information on how to set up factgenie is on [Github wiki](../../wiki/Setup).
 
 ## 💡 Usage guide
 
 
 See the following **wiki pages** that that will guide you through various use-cases of factgenie:
 
-| Topic                                                                  | Description                                        |
-| ---------------------------------------------------------------------- | -------------------------------------------------- |
-| 🔧 [Setup](../../wiki/01-Setup)                                         | How to install factgenie.                          |
-| 🗂️ [Data Management](../../wiki/02-Data-Management)                     | How to manage datasets and model outputs.          |
-| 🤖 [LLM Annotations](../../wiki/03-LLM-Annotations)                     | How to annotate outputs using LLMs.                |
-| 👥 [Crowdsourcing Annotations](../../wiki/04-Crowdsourcing-Annotations) | How to annotate outputs using human crowdworkers.  |
-| ✍️  [Generating Outputs](../../wiki/05-Generating-Outputs)              | How to generate outputs using LLMs.                |
-| 📊 [Analyzing Annotations](../../wiki/06-Analyzing-Annotations)         | How to obtain statistics on collected annotations. |
-| 🌱 [Contributing](../../wiki/07-Contributing)                           | How to contribute to factgenie.                    |
+| Topic                                                               | Description                                        |
+| ------------------------------------------------------------------- | -------------------------------------------------- |
+| 🔧 [Setup](../../wiki/Setup)                                         | How to install factgenie.                          |
+| 🗂️ [Data Management](../../wiki/Data-Management)                     | How to manage datasets and model outputs.          |
+| 🤖 [LLM Annotations](../../wiki/LLM-Annotations)                     | How to annotate outputs using LLMs.                |
+| 👥 [Crowdsourcing Annotations](../../wiki/Crowdsourcing-Annotations) | How to annotate outputs using human crowdworkers.  |
+| ✍️  [Generating Outputs](../../wiki/Generating-Outputs)              | How to generate outputs using LLMs.                |
+| 📊 [Analyzing Annotations](../../wiki/Analyzing-Annotations)         | How to obtain statistics on collected annotations. |
+| 💻 [Command Line Interface](../../wiki/CLI)                          | How to use factgenie command line interface.       |
+| 🌱 [Contributing](../../wiki/Contributing)                           | How to contribute to factgenie.                    |
 
 ## 🔥 Tutorials
 We also provide step-by-step walkthroughs showing how to employ factgenie on the [the dataset from the Shared Task in Evaluating Semantic Accuracy](https://github.com/ehudreiter/accuracySharedTask):
 
-| Tutorial                                                                                                                       | Description                                                                                      |
-| ------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------ |
-| [🏀 #1: Importing a custom dataset](../../wiki/00-Tutorials#-tutorial-1-importing-a-custom-dataset)                             | Loading the basketball statistics and model-generated basketball reports into the web interface. |
-| [💬 #2: Generating outputs](../../wiki/00-Tutorials#-tutorial-2-generating-outputs)                                             | Using Llama 3.1 with Ollama for generating basketball reports.                                   |
-| [📊 #3: Customizing data visualization](../../wiki/00-Tutorials#-tutorial-3-customizing-data-visualization)                     | Manually creating a custom dataset class for better data visualization.                          |
-| [🤖 #4: Annotating outputs with an LLM](../../wiki/00-Tutorials#-tutorial-4-annotating-outputs-with-an-llm)                     | Using GPT-4o for annotating errors in the basketball reports.                                    |
-| [👨‍💼 #5: Annotating outputs with human annotators](../../wiki/00-Tutorials#-tutorial-5-annotating-outputs-with-human-annotators) | Using human annotators for annotating errors in the basketball reports.                          |
+| Tutorial                                                                                                                    | Description                                                                                      |
+| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------ |
+| [🏀 #1: Importing a custom dataset](../../wiki/Tutorials#-tutorial-1-importing-a-custom-dataset)                             | Loading the basketball statistics and model-generated basketball reports into the web interface. |
+| [💬 #2: Generating outputs](../../wiki/Tutorials#-tutorial-2-generating-outputs)                                             | Using Llama 3.1 with Ollama for generating basketball reports.                                   |
+| [📊 #3: Customizing data visualization](../../wiki/Tutorials#-tutorial-3-customizing-data-visualization)                     | Manually creating a custom dataset class for better data visualization.                          |
+| [🤖 #4: Annotating outputs with an LLM](../../wiki/Tutorials#-tutorial-4-annotating-outputs-with-an-llm)                     | Using GPT-4o for annotating errors in the basketball reports.                                    |
+| [👨‍💼 #5: Annotating outputs with human annotators](../../wiki/Tutorials#-tutorial-5-annotating-outputs-with-human-annotators) | Using human annotators for annotating errors in the basketball reports.                          |
 
 
 ## 💬 Cite us

diff --git a/docker-compose.yml b/docker-compose.yml
@@ -1,8 +1,29 @@
+# YOU NEED run once `curl http://localhost:11434/api/pull -d '{"name": "llama3.1:8b"}'`
+# after running `docker-compose up -d` from the repo root directory
+# in order to download the llama3.1:8b model which is the default model 
+# we use in the example configurations for factgenie
 services:
   factgenie:
     container_name: factgenie
     image: factgenie
     restart: on-failure
     ports:
-      - 8080:80
-    build: ./factgenie
+      - 8890:80
+    build: ./
+
+  # Factgenie connects to LLM inference servers either OpenAI client or Ollama
+  # Demonstrates running ollama on CPU 
+  #   For GPU run ollama without Docker
+  # or look at https://hub.docker.com/r/ollama/ollama and follow the GPU instructions
+  ollama:
+    container_name: ollama
+    image: ollama/ollama
+    restart: on-failure
+    # We need to expose the port to your machine because you need to pull models for ollama
+    # before factgenie queries the ollama server to run inference for the model.
+    # E.g. curl http://localhost:11434/api/pull -d '{"name": "llama3.1:8b"}' to download the factgenie default LLM.
+    ports:
+      - 11434:11434
+
+
+
diff --git a/factgenie/__init__.py b/factgenie/__init__.py
@@ -2,29 +2,21 @@
 
 PACKAGE_DIR = Path(__file__).parent
 ROOT_DIR = PACKAGE_DIR.parent
+
 TEMPLATES_DIR = PACKAGE_DIR / "templates"
 STATIC_DIR = PACKAGE_DIR / "static"
-ANNOTATIONS_DIR = PACKAGE_DIR / "annotations"
-GENERATIONS_DIR = PACKAGE_DIR / "generations"
+CAMPAIGN_DIR = PACKAGE_DIR / "campaigns"
 LLM_EVAL_CONFIG_DIR = PACKAGE_DIR / "config" / "llm-eval"
 LLM_GEN_CONFIG_DIR = PACKAGE_DIR / "config" / "llm-gen"
 CROWDSOURCING_CONFIG_DIR = PACKAGE_DIR / "config" / "crowdsourcing"
 
-DATA_DIR = PACKAGE_DIR / "data"
-OUTPUT_DIR = PACKAGE_DIR / "outputs"
+INPUT_DIR = PACKAGE_DIR / "data" / "inputs"
+OUTPUT_DIR = PACKAGE_DIR / "data" / "outputs"
 
+DATASET_CONFIG_PATH = PACKAGE_DIR / "data" / "datasets.yml"
 RESOURCES_CONFIG_PATH = PACKAGE_DIR / "config" / "resources.yml"
-DATASET_CONFIG_PATH = PACKAGE_DIR / "config" / "datasets.yml"
-
-OLD_DATASET_CONFIG_PATH = PACKAGE_DIR / "loaders" / "datasets.yml"
-OLD_MAIN_CONFIG_PATH = PACKAGE_DIR / "config.yml"
 
 MAIN_CONFIG_PATH = PACKAGE_DIR / "config" / "config.yml"
-if not MAIN_CONFIG_PATH.exists() and not OLD_MAIN_CONFIG_PATH.exists():
-    raise ValueError(
-        f"Invalid path to config.yml {MAIN_CONFIG_PATH=}. "
-        "Please copy config_TEMPLATE.yml to config.yml "
-        "and change the password, update the host prefix, etc."
-    )
-
+MAIN_CONFIG_TEMPLATE_PATH = PACKAGE_DIR / "config" / "config_TEMPLATE.yml"
+DEFAULT_PROMPTS_CONFIG_PATH = PACKAGE_DIR / "config" / "default_prompts.yml"
 PREVIEW_STUDY_ID = "factgenie_preview"
Original file line number	Diff line number	Diff line change
Expand Up		@@ -2,4 +2,4 @@

		Thank you for considering contributing to factgenie!

		Please, see the 🌱 [Contributing](../../wiki/07-Contributing) page on our wiki for details.
		Please, see the 🌱 [Contributing](../../wiki/Contributing) page on our wiki for details.