diff --git a/.gitignore b/.gitignore
index 1b9a7dc..3db429b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -166,4 +166,5 @@ notebooks/
.vscode/
src/scraping/data/*
-#src/genai/parenting_chatbot/prodigy_eval/data/*.jsonl
+src/evals/parenting_chatbot/*
+src/genai/parenting_chatbot/prodigy_eval/_scrap/*
diff --git a/src/genai/parenting_chatbot/prodigy_eval/README.md b/src/genai/parenting_chatbot/prodigy_eval/README.md
index ef6e36b..3f0eeea 100644
--- a/src/genai/parenting_chatbot/prodigy_eval/README.md
+++ b/src/genai/parenting_chatbot/prodigy_eval/README.md
@@ -1,4 +1,25 @@
-# Creating a Prodigy app to evaluate parenting chatbot answers
+# Using Prodigy platform to evaluate parenting chatbot answers
+
+- [Overview](#overview)
+- [Setup instructions](#setup-instructions)
+ - [Evaluation data](#evaluation-data)
+- [Rating answers in Prodigy](#rating-answers-in-prodigy)
+ - [Prodigy recipe](#prodigy-recipe)
+ - [Running prodigy locally](#running-prodigy-locally)
+ - [Running prodigy on an ec2 instance](#running-prodigy-on-an-ec2-instance)
+- [Troubleshooting](#troubleshooting)
+
+## Overview
+
+This is an experiment to test [Prodigy](https://prodi.gy/) as a platform for evaluating the quality of answers from our early-years parenting chatbot prototype. The prototype uses GPT-4 (OpenAI API) as its large language model, and leverages retrieval-augmented generation (RAG) to ground the answers in a trusted knowledge base: in this case, text on the NHS Start for Life website.
+
+In this experiment, we are making pairwise comparisons between answers from the chatbot and answers from humans. We are asking annotators to select the answer that is most appropriate for the question. Note that annotators do not know, which answer is from the chatbot and which is from a human.
+
+We are also adding a third option, which is an answer from the GPT-4 model without any external knowledge base. This is to see whether the use of RAG and grounding the answers in trusted knowledge base improves the perceived quality of the answers.
+
+
+
+
## Setup instructions
@@ -6,56 +27,185 @@ To install prodigy, you will need to run
```
python -m pip install --upgrade prodigy -f https://XXXX-XXXX-XXXX-XXXX@download.prodi.gy
```
-Replace 'XXXX' with your key (see the docs [here](https://prodi.gy/docs/install)).
+Replace 'XXXX' with your key (see the official prodigy docs [here](https://prodi.gy/docs/install)).
+
+### Evaluation data
+
+**Human answers**
+
+We used the "top ten most asked parenting questions" from an article on [Kids Collective](https://www.thekidcollective.co.uk/blog/most-asked-parenting-questions.html) website. According to the article, these questions have been sourced from Google Trends and pertain specifically to the UK.
+
+The article also provides short, 1-2 paragraph answers to each of the questions. We used these answers as the reference "human" answers in our evaluation.
+
+While this source is not necessarily an authoritative one (The Kid Collective appears to be primarily a baby retail website), we deemed it as a good starting point for our evaluation, as the questions are presumably among the most popular ones that caregivers have, and the answers are written in a conversational style that is easy to understand. Moreover, this was more of proof-of-concept, to test out the prodigy platform and the general evaluation approach.
+
+Other freely-available options that we considered were answers to parenting questions by the US-based [Public Broadcasting Service](https://www.pbs.org/wholechild/parents/faqs.html) (also not necessarily an authoritative source) and [Action for Children](https://parents.actionforchildren.org.uk/mental-health-wellbeing/?age=toddler) (they are, however, more specialised in mental health and the answers appeared slightly more difficult to extract from the website). In the future, we could expand this data with more questions and the answers should be provided by caregiving experts.
+
+**LLM answers**
+
+We then simply asked the same 10 questions to our chatbot and to GPT-4 (without RAG), and recorded the answers. The script for generating the GPT-4 answers can be found in `generate_gpt4_answers.py`, whereas the chatbot answers were generating by manually using the chatbot and saving the answers in a log file.
+
+All the answers from the different sources are stored in the `data` folder:
+- `answers_human.jsonl` contains the human answers
+- `answers_rag.jsonl` contains the answers from the chatbot
+- `answers_gpt4.jsonl` contains the answers from GPT-4
+
+## Rating answers in Prodigy
+
+We combined the answers from the different sources and created 30 pairs of answers, using the script `create_eval_data.py`. The final set of answer pairs can be found in `data/answers.jsonl`, in a format that's suitable for Prodigy.
+
+The human annotators are presented with a question and two answers at a time, and asked to choose the better answer from the pair. We can then analyse the pairwise comparisons and, for example, determine the source that has been the most preferred one.
+
+![prodigy-screenshot](figures/prodigy_screenshot.png)
+
+### Prodigy recipe
-Create a directory `src/genai/parenting_chatbot/prodigy_eval/data/`.
+Prodigy uses a "recipe" to define the annotation task. The recipe for this evaluation can be found in `best_answer_recipe.py`.
-### Create the data
+We begin by importing various utilities; note that we'll use random module to shuffle data.
+```python
+import random
+from typing import Dict, Generator, List
+import prodigy
+from prodigy.components.loaders import JSONL
+```
+
+Next, we define `GLOBAL_CSS` which is a string of CSS styles. This lets us tailor the appearance of the Prodigy interface. Through this, we can modify font sizes, layout of answer option boxes, container width, and more.
-To make some artificial data, run:
+```python
+GLOBAL_CSS = (
+ ".prodigy-content{font-size: 15px}"
+ " .prodigy-option{width: 49%}"
+ " .prodigy-option{align-items:flex-start}"
+ " .prodigy-option{margin-right: 3px}"
+ " .prodigy-container{max-width: 1200px}"
+)
```
-python src/genai/parenting_chatbot/prodigy_eval/create_data.py
+
+The `best_answer` function defines a custom Prodigy recipe. With the @prodigy.recipe decorator, we define the expected arguments and set the stage for creating a Prodigy task. This function's role is to process the input data and return it in a format suitable for Prodigy to render.
+
+```python
+@prodigy.recipe(
+ "best_answer",
+ dataset=("The dataset to save to", "positional", None, str),
+ file_path=("Path to the questions and answers file", "positional", None, str),
+)
+def best_answer(dataset: str, file_path: str) -> Dict:
+ """
+ Choose the best answer out of the given options.
+
+ Arguments:
+ dataset: The dataset to save to.
+ file_path: Path to the questions and answers file.
+
+ Returns:
+ A dictionary containing the recipe configuration.
+
+ """
```
-This will save data to `src/genai/parenting_chatbot/prodigy_eval/data/training_data.jsonl` and this will be the training data that your annotators annotate via the Prodigy app.
-This also saves to `prototypes/parenting-chatbot/prodigy_evaluation/` in the S3 bucket.
+We use Prodigy's JSONL loader to fetch and store our answers dataset into the stream variable, making it ready for subsequent processing. Once data is loaded, the next step is to make sure it's presented in a random sequence and in a format Prodigy understands. By shuffling our data, we prevent potential biases in the order of presentation. After shuffling, the `format_stream` function transforms the questions and answers into a digestible structure, ready for the Prodigy user interface.
+
+```python
+ # Load the data
+ stream = list(JSONL(file_path))
-### Fetch training data
+ def get_shuffled_stream(stream: List) -> Generator:
+ random.shuffle(stream)
+ for eg in stream:
+ yield eg
-Prodigy cannot stream data directly from s3, so before spinning up the Prodigy app, we need to download data from s3 and store it locally. To do this, run the script `src/genai/parenting_chatbot/prodigy_eval/fetch_from_s3.py` from the command line, using the arguments `--s3_path` to specify where the file is stored in the s3 bucket, and `--out_path` to specify where you want the file to be stored locally:
+ # Process the stream to format for Prodigy
+ def format_stream(stream: List) -> Dict:
+ for item in stream:
+ question = item["question"]
+ options = [{"id": key, "html": value} for key, value in item["answers"].items()]
+ yield {"html": question, "options": options}
+ stream = format_stream(get_shuffled_stream(stream))
```
-python src/genai/parenting_chatbot/prodigy_eval/fetch_from_s3.py --s3_path='prototypes/parenting-chatbot/prodigy_evaluation/prodigy_training_data.jsonl' --out_path='src/genai/parenting_chatbot/prodigy_eval/data/prodigy_training_data.jsonl'
+
+Finally, we lay down the rules for Prodigy by setting up the recipe configuration. This dictionary defines how our task should appear and function within the Prodigy interface. From specifying the 'choice' interface, naming the dataset, feeding the processed data stream, to determining interaction buttons and setting up other interface-related parameters – this configuration is our guide to how the task will run.
+
+```python
+ return {
+ # Use the choice interface
+ "view_id": "choice",
+ # Name of the dataset
+ "dataset": dataset,
+ # The data stream
+ "stream": stream,
+ "config": {
+ # Only allow one choice
+ "choice_style": "single",
+ "task_description": "Choose the best answer",
+ "choice_auto_accept": False,
+ # Define which buttons to show
+ "buttons": ["accept", "ignore"],
+ # Add custom css
+ "global_css": GLOBAL_CSS,
+ # If feed_overlap is True, the same example can be sent out to multiple users at the same time
+ "feed_overlap": True,
+ # Port to run the server on
+ "port": 8080,
+ # Important to set host to 0.0.0.0 when running on ec2
+ "host": "0.0.0.0",
+ # Setting instant_submit as True means that the user doesn't have to click the "save" button
+ "instant_submit": True,
+ },
+ }
```
-### Rate answers in Prodigy
+### Running prodigy locally
-Once you have the data stored locally, run the following to spin up a Prodigy instance:
+To test the platform locally, run the following command to spin up a Prodigy instance:
```
-prodigy best_answer answer_data src/genai/parenting_chatbot/prodigy_eval/data/prodigy_training_data.jsonl -F src/genai/parenting_chatbot/prodigy_eval/best_answer_recipe.py
+python -m prodigy best_answer answer_data src/genai/parenting_chatbot/prodigy_eval/data/answers.jsonl -F src/genai/parenting_chatbot/prodigy_eval/best_answer_recipe.py
```
-Once you run this line, a URL should be given to you in the command line. Visit this URL to access the Prodigy app.
+Once you run this line, a URL http://0.0.0.0:8080 should be given to you in the command line. Visit this URL to access the Prodigy app. Note that you will need to specify the user session ID, by adding `?session=your_session_id` to the URL. This is so that Prodigy can keep track of your annotations.
Select an answer to each question, click the green tick button at the bottom, and when you're done, click the "save" icon at the top left.
-Make the output from your annotations available by running `prodigy db-out answer_data > output.jsonl`.
+Fetch the output from your annotations available by running `prodigy db-out answer_data > output.jsonl`.
+
+### Running prodigy on an ec2 instance
+
+To make the platform available to multiple annotators, we can run Prodigy on an ec2 instance. We can then share the URL with the annotators, and they can access the platform from their own computers.
+
+We first spin up an ec2 instance (t2.micro, which is quite cheap at around [$0.0116 per hour](https://aws.amazon.com/ec2/instance-types/t2/)) and used ssh to connect to it.
+
+Once connected to the instance, we cloned this repo (to fetch the answers data) and installed prodigy on the instance using the instructions above.
+
+To make the Prodigy instance run in the background and when we've disconnected from instance, we used `screen` by first simply running screen in the terminal
+```shell
+screen
+```
+
+And then ran the same command to spin up a Prodigy instance:
+
+```shell
+python3 -m prodigy best_answer answer_data src/genai/parenting_chatbot/prodigy_eval/data/answers.jsonl -F src/genai/parenting_chatbot/prodigy_eval/best_answer_recipe.py
+```
+
+To detach from the screen, press `ctrl+a` and then `d`. The prodigy instance will continue running in the background. To reattach to the screen, run `screen -r`. You can also run `screen -ls` to see the list of screens that are running. To stop a screen and terminate the prodigy instance, run `screen -X -S prodigy_screen_session_id "quit"`.
+
+Finally, you will need to open the port 8080 to allow other users to join from their computers. This can be done by going to the ec2 instance settings on AWS website, and adding a new inbound rule to the security group. The rule should be of type "Custom TCP", port range 8080. If you wish to control who can connect to your Prodigy instance (recommended), you can also specify the allowed IP addresses.
-If you would like a summary of the selection you made for each question, run `python analyse_annotations.py`. This will print each question + the selection (human/gpt4/RAG) to the command line.
## Troubleshooting
-**I just started a new session in Prodigy, but it says there are already some examples.**
+**I'm doing an annotation session in Prodigy, but it says there are already some examples.**
-Prodigy has its own mysterious SQLite database. Whenever you want to see the data, you run
+Prodigy has its own SQLite database. Whenever you want to fetch the annotation data, you run
```
prodigy db-out name_of_your_data > output.jsonl
```
-If you were running a session earlier and saved annotations to `name_of_your_data`, then you want to start a new session but use the name `name_of_your_data` again, Prodigy will tell you that you already have X annotations because you saved X annotations to this dataset earlier.
+If you were running a session earlier and saved annotations to `name_of_your_data`, and then you wish to start a new session using the same session id, Prodigy will tell you that you already have X annotations because you saved X annotations to this dataset earlier.
-The solution is to run:
+To clean the database and delete all annotations, you can run:
```
prodigy drop name_of_your_data
```
-OR just pick a new dataset name, and then Prodigy will create a new dataset.
+Alternatively, you can just pick a new session id.
diff --git a/src/genai/parenting_chatbot/prodigy_eval/_to_delete/create_data.py b/src/genai/parenting_chatbot/prodigy_eval/_to_delete/create_data.py
deleted file mode 100644
index de200a0..0000000
--- a/src/genai/parenting_chatbot/prodigy_eval/_to_delete/create_data.py
+++ /dev/null
@@ -1,140 +0,0 @@
-import argparse
-import json
-import os
-
-from dotenv import load_dotenv
-
-from genai.streamlit_pages import parenting_page
-
-
-parser = argparse.ArgumentParser(description="Name and location of output file")
-parser.add_argument(
- "--out_dir",
- type=str,
- dest="out_dir",
- help="What value house are you trying to buy?",
- default="src/genai/parenting_chatbot/prodigy_eval/data/",
-)
-
-args, unknown = parser.parse_known_args()
-
-FILEPATH = os.path.join(args.out_dir, "training_data.jsonl")
-
-load_dotenv()
-
-aws_key = os.environ["AWS_ACCESS_KEY_ID"]
-aws_secret = os.environ["AWS_SECRET_ACCESS_KEY"]
-s3_path = os.environ["S3_BUCKET"]
-
-# Sample data
-data = [
- {
- "question": "How to baby-proof?",
- "answers": {
- "human": """
- Baby-proofing your house is recommended for when your baby begins to explore, often when they’re beginning to crawl. Some baby-proofing precautions you can take include:
-
- - Secure Furniture: Anchor heavy furniture, like bookshelves and dressers, to the wall to prevent them from tipping over. Babies often try to pull themselves up using these items.
- - Outlet Covers: Use plastic covers for electrical outlets to prevent curious fingers from poking into them.
- - Cabinet and Drawer Latches: Install latches on cabinets and drawers, especially those containing cleaning products, chemicals, or small objects that can be swallowed.
-
- """,
- "gpt4": "If a newborn has a fever, it's crucial to keep them hydrated and monitor their temperature. Seek medical attention promptly.",
- "rag": "Fever in a newborn can be concerning. It's recommended to consult with a healthcare professional right away.",
- },
- },
- {
- "question": "What should you do if a newborn has a fever?",
- "answers": {
- "human": "You should contact your pediatrician immediately as fever in a newborn can be a sign of a serious infection.",
- "gpt4": "If a newborn has a fever, it's crucial to keep them hydrated and monitor their temperature. Seek medical attention promptly.",
- "rag": "Fever in a newborn can be concerning. It's recommended to consult with a healthcare professional right away.",
- },
- },
- {
- "question": "How do you change a baby's nappy?",
- "answers": {
- "human": "The best place to change a nappy is on a changing mat or towel on the floor, particularly if you have more than one baby.",
- "gpt4": "Babies need frequent nappy changes.",
- "rag": "If your baby's nappy is dirty, use the nappy to clean off most of the poo from their bottom.",
- },
- },
- {
- "question": "How often should a newborn be bathed?",
- "answers": {
- "human": "Newborns don't need daily baths. Instead, you can bathe them every 2 to 3 days or as needed.",
- "gpt4": "It's usually recommended to bathe a newborn 2 to 3 times a week, but you can clean their face, neck, and hands daily.",
- "rag": "You don't need to give your newborn a bath every day. Two to three times a week is sufficient.",
- },
- },
- {
- "question": "What's the best position for a baby to sleep?",
- "answers": {
- "human": "The safest position for a baby is on their back. This reduces the risk of Sudden Infant Death Syndrome (SIDS).",
- "gpt4": "Babies should always be placed on their backs to sleep to minimize the risk of SIDS.",
- "rag": "Put your baby to sleep on their back to ensure safety and decrease the risk of SIDS.",
- },
- },
- {
- "question": "How can I soothe a crying newborn?",
- "answers": {
- "human": "Try swaddling, rocking, or offering a pacifier. Sometimes, the baby might just be hungry or need a diaper change.",
- "gpt4": "Newborns can be comforted by swaddling, gentle rocking, white noise, or even just holding them close.",
- "rag": "Consider swaddling, gentle motions, singing, or ensuring their basic needs are met to soothe a crying newborn.",
- },
- },
- {
- "question": "Is it normal for newborns to hiccup often?",
- "answers": {
- "human": "Yes, it's common for newborns to hiccup. It's usually not a cause for concern unless accompanied by other worrying symptoms.",
- "gpt4": "Hiccups are a normal part of a newborn's development and usually aren't a sign of any underlying issue.",
- "rag": "Newborns often hiccup, and it's generally considered normal. If you're concerned, consult your pediatrician.",
- },
- },
- {
- "question": "When should I introduce solid foods to my baby?",
- "answers": {
- "human": "Most experts recommend introducing solid foods around 6 months of age, but always consult with your pediatrician.",
- "gpt4": "Solid foods are typically introduced to babies at about 6 months old, but it's essential to look for signs of readiness and speak with a healthcare professional.",
- "rag": "Introduce solid foods to your baby around the 6-month mark, but ensure they show signs of readiness and get advice from a pediatrician.",
- },
- },
- {
- "question": "How can I help my newborn establish a sleep routine?",
- "answers": {
- "human": "Stick to a consistent bedtime routine, keep the room dark and quiet, and try to feed and change the baby before putting them down.",
- "gpt4": "Consistency is key. Develop a bedtime routine, reduce stimulation before sleep, and ensure they're well-fed and dry.",
- "rag": "Establishing a regular bedtime routine, reducing pre-sleep stimulation, and creating a calm environment can help.",
- },
- },
- {
- "question": "Why does my newborn sneeze so much?",
- "answers": {
- "human": "Newborns sneeze to clear their nasal and respiratory passages. It's normal and not necessarily a sign of illness.",
- "gpt4": "Sneezing is common in newborns as they clear out lint, dust, and mucus from their noses.",
- "rag": "Newborns often sneeze to clear tiny particles from their noses. It's a natural reflex and not a cause for concern.",
- },
- },
- {
- "question": "Can I take my newborn outside?",
- "answers": {
- "human": "Yes, but make sure to dress them appropriately for the weather and avoid direct sunlight. Avoid crowded places in the early weeks.",
- "gpt4": "Taking your newborn outside is fine, but ensure they're protected from the elements and avoid high traffic areas.",
- "rag": "It's good for newborns to get fresh air. Just ensure they're dressed properly and shielded from direct sun or extreme temperatures.",
- },
- },
-]
-
-# Saving data to a .jsonl file
-with open(FILEPATH, "w") as file:
- for entry in data:
- file.write(json.dumps(entry) + "\n")
-
-parenting_page.write_to_s3(
- aws_key,
- aws_secret,
- f"{s3_path}/prototypes/parenting-chatbot/prodigy_evaluation",
- "prodigy_training_data",
- data,
- how="w",
-)
diff --git a/src/genai/parenting_chatbot/prodigy_eval/_to_delete/generate_gpt4_answers.ipynb b/src/genai/parenting_chatbot/prodigy_eval/_to_delete/generate_gpt4_answers.ipynb
deleted file mode 100644
index 6330b25..0000000
--- a/src/genai/parenting_chatbot/prodigy_eval/_to_delete/generate_gpt4_answers.ipynb
+++ /dev/null
@@ -1,165 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": 10,
- "metadata": {},
- "outputs": [],
- "source": [
- "import pandas as pd\n",
- "from itertools import combinations\n",
- "# DIR = \"src/genai/parenting_chatbot/prodigy_eval/data/\"\n",
- "DIR = \"data/\"\n",
- "ANSWER_FILE = DIR + \"answers_{}.jsonl\"\n",
- "\n",
- "questions = (\n",
- " pd.read_json(path_or_buf=\"data/questions.jsonl\", lines=True)\n",
- " .question.to_list()\n",
- ")\n",
- "\n",
- "answer_types = [\"human\", \"rag\", \"gpt4\"]\n",
- "answers = [\n",
- " pd.read_json(path_or_buf=ANSWER_FILE.format(answer_type), lines=True)[answer_type]\n",
- " for answer_type in answer_types\n",
- "]\n",
- "\n",
- "answer_type_pairs = list(combinations(answer_types, 2))\n",
- "answer_type_pairs\n",
- "\n",
- "question_prefix = \"Which one is a better answer to this question:\\n\"\n",
- "question_suffix = \"\"\n",
- "\n",
- "pd.set_option(\"max_colwidth\", 1000)\n",
- "# Create a dataframe with four columns: question, human, rag, gpt4\n",
- "df = pd.DataFrame(\n",
- " {\n",
- " \"question\": questions,\n",
- " \"human\": answers[0],\n",
- " \"rag\": answers[1],\n",
- " \"gpt4\": answers[2],\n",
- " }\n",
- ")\n",
- "# Create all possible pairwise combinations of the question + three answer types: human, rag, gpt4\n",
- "df = df.melt(id_vars=[\"question\"], value_vars=[\"human\", \"rag\", \"gpt4\"])\n",
- "# Rename the value column to answer\n",
- "df = df.rename(columns={\"value\": \"answer\"})\n",
- "df[\"question\"] = question_prefix + df[\"question\"] + question_suffix\n",
- "# Create a new dictionary column with the format {\"\": }\n",
- "df[\"answer\"] = df.apply(lambda x: {x[\"variable\"]: x[\"answer\"]}, axis=1)\n",
- "dfs = []\n",
- "for answer_type_pair in answer_type_pairs:\n",
- " dfs.append(\n",
- " df[df.variable.isin(answer_type_pair).copy()]\n",
- " .groupby(\"question\").agg(lambda x: x.tolist())\n",
- " .reset_index()\n",
- " )\n",
- "dfs = pd.concat(dfs, ignore_index=True)\n",
- "# combine the answer types into a single dictionary\n",
- "dfs[\"answers\"] = dfs.apply(lambda x: {k: v for d in x[\"answer\"] for k, v in d.items()}, axis=1)\n",
- "dfs = dfs.drop(columns=[\"variable\", \"answer\"]).sort_values(by=\"question\").reset_index(drop=True)\n",
- "dfs.to_json(path_or_buf=DIR + \"answers.jsonl\", orient=\"records\", lines=True)\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 94,
- "metadata": {},
- "outputs": [],
- "source": [
- "import pandas as pd\n",
- "from itertools import combinations\n",
- "\n",
- "# Constants\n",
- "DATA_DIR = \"data/\"\n",
- "QUESTION_FILE = DATA_DIR + \"questions.jsonl\"\n",
- "ANSWER_FILE = DATA_DIR + \"answers_{}.jsonl\"\n",
- "OUTPUT_FILE = DATA_DIR + \"answers.jsonl\"\n",
- "# Define answer types and load corresponding answers\n",
- "ANSWER_TYPES = [\"human\", \"rag\", \"gpt4\"]\n",
- "# html formatting prefix and suffix for questions\n",
- "QUESTION_PREFIX = \"Which one is a better answer to this question:\"\n",
- "QUESTION_SUFFIX = \"\"\n",
- "\n",
- "# Set column width option for pandas DataFrame display\n",
- "pd.set_option(\"max_colwidth\", 1000)\n",
- "\n",
- "# Load questions\n",
- "questions = pd.read_json(QUESTION_FILE, lines=True)[\"question\"].to_list()\n",
- "# Load answers\n",
- "answers = [\n",
- " pd.read_json(path_or_buf=ANSWER_FILE.format(answer_type), lines=True)[answer_type]\n",
- " for answer_type in answer_types\n",
- "]\n",
- "\n",
- "# Construct a dataframe with columns: question, human, rag, and gpt4\n",
- "answers_df = (\n",
- " pd.DataFrame({\"question\": questions, \"human\": answers[0], \"rag\": answers[1], \"gpt4\": answers[2]})\n",
- " # Melt the dataframe for pairwise combinations and rename the resulting column\n",
- " .melt(id_vars=[\"question\"], value_vars=ANSWER_TYPES)\n",
- " .rename(columns={\"value\": \"answer\"})\n",
- " .assign(question=lambda df: QUESTION_PREFIX + df[\"question\"] + QUESTION_SUFFIX)\n",
- " .assign(answer=lambda df: df.apply(lambda x: {x[\"variable\"]: x[\"answer\"]}, axis=1))\n",
- ")\n",
- "\n",
- "# Generate pairwise combinations of answer types\n",
- "answer_type_pairs = list(combinations(ANSWER_TYPES, 2))\n",
- "\n",
- "# Aggregate answers based on the pairwise combinations of answer types\n",
- "dataframes = []\n",
- "for answer_type_pair in answer_type_pairs:\n",
- " subset_df = answers_df[answers_df[\"variable\"].isin(answer_type_pair)]\n",
- " aggregated_df = subset_df.groupby(\"question\").agg(lambda x: x.tolist()).reset_index()\n",
- " dataframes.append(aggregated_df)\n",
- "\n",
- "# Combine the results, merge dictionaries and save the output\n",
- "(\n",
- " pd.concat(dataframes, ignore_index=True)\n",
- " .assign(answers=lambda df: df.apply(lambda x: {k: v for d in x[\"answer\"] for k, v in d.items()}, axis=1))\n",
- " .drop(columns=[\"variable\", \"answer\"])\n",
- " .sort_values(by=\"question\")\n",
- " .reset_index(drop=True)\n",
- " .to_json(path_or_buf=OUTPUT_FILE, orient=\"records\", lines=True)\n",
- ")\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 93,
- "metadata": {},
- "outputs": [],
- "source": [
- "answers_df = (\n",
- " pd.DataFrame({\"question\": questions, \"human\": answers[0], \"rag\": answers[1], \"gpt4\": answers[2]})\n",
- " # Melt the dataframe for pairwise combinations and rename the resulting column\n",
- " .melt(id_vars=[\"question\"], value_vars=ANSWER_TYPES)\n",
- " .rename(columns={\"value\": \"answer\"}, inplace=True)\n",
- " # .assign(question=lambda df: QUESTION_PREFIX + df[\"question\"] + QUESTION_SUFFIX)\n",
- " # .assign(answer=lambda df: df.apply(lambda x: {x[\"variable\"]: x[\"answer\"]}, axis=1))\n",
- ")\n",
- "answers_df"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": ".venv",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.9.18"
- },
- "orig_nbformat": 4
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/src/genai/parenting_chatbot/prodigy_eval/_to_delete/training_data.jsonl b/src/genai/parenting_chatbot/prodigy_eval/_to_delete/training_data.jsonl
deleted file mode 100644
index 372549f..0000000
--- a/src/genai/parenting_chatbot/prodigy_eval/_to_delete/training_data.jsonl
+++ /dev/null
@@ -1,11 +0,0 @@
-{"question": "How to baby-proof?", "answers": {"human": "\n Baby-proofing your house is recommended for when your baby begins to explore, often when they\u2019re beginning to crawl. Some baby-proofing precautions you can take include:
\n \n - Secure Furniture: Anchor heavy furniture, like bookshelves and dressers, to the wall to prevent them from tipping over. Babies often try to pull themselves up using these items.
\n - Outlet Covers: Use plastic covers for electrical outlets to prevent curious fingers from poking into them.
\n - Cabinet and Drawer Latches: Install latches on cabinets and drawers, especially those containing cleaning products, chemicals, or small objects that can be swallowed.
\n
\n ", "gpt4": "If a newborn has a fever, it's crucial to keep them hydrated and monitor their temperature. Seek medical attention promptly.", "rag": "Fever in a newborn can be concerning. It's recommended to consult with a healthcare professional right away."}}
-{"question": "What should you do if a newborn has a fever?", "answers": {"human": "You should contact your pediatrician immediately as fever in a newborn can be a sign of a serious infection.", "gpt4": "If a newborn has a fever, it's crucial to keep them hydrated and monitor their temperature. Seek medical attention promptly.", "rag": "Fever in a newborn can be concerning. It's recommended to consult with a healthcare professional right away."}}
-{"question": "How do you change a baby's nappy?", "answers": {"human": "The best place to change a nappy is on a changing mat or towel on the floor, particularly if you have more than one baby.", "gpt4": "Babies need frequent nappy changes.", "rag": "If your baby's nappy is dirty, use the nappy to clean off most of the poo from their bottom."}}
-{"question": "How often should a newborn be bathed?", "answers": {"human": "Newborns don't need daily baths. Instead, you can bathe them every 2 to 3 days or as needed.", "gpt4": "It's usually recommended to bathe a newborn 2 to 3 times a week, but you can clean their face, neck, and hands daily.", "rag": "You don't need to give your newborn a bath every day. Two to three times a week is sufficient."}}
-{"question": "What's the best position for a baby to sleep?", "answers": {"human": "The safest position for a baby is on their back. This reduces the risk of Sudden Infant Death Syndrome (SIDS).", "gpt4": "Babies should always be placed on their backs to sleep to minimize the risk of SIDS.", "rag": "Put your baby to sleep on their back to ensure safety and decrease the risk of SIDS."}}
-{"question": "How can I soothe a crying newborn?", "answers": {"human": "Try swaddling, rocking, or offering a pacifier. Sometimes, the baby might just be hungry or need a diaper change.", "gpt4": "Newborns can be comforted by swaddling, gentle rocking, white noise, or even just holding them close.", "rag": "Consider swaddling, gentle motions, singing, or ensuring their basic needs are met to soothe a crying newborn."}}
-{"question": "Is it normal for newborns to hiccup often?", "answers": {"human": "Yes, it's common for newborns to hiccup. It's usually not a cause for concern unless accompanied by other worrying symptoms.", "gpt4": "Hiccups are a normal part of a newborn's development and usually aren't a sign of any underlying issue.", "rag": "Newborns often hiccup, and it's generally considered normal. If you're concerned, consult your pediatrician."}}
-{"question": "When should I introduce solid foods to my baby?", "answers": {"human": "Most experts recommend introducing solid foods around 6 months of age, but always consult with your pediatrician.", "gpt4": "Solid foods are typically introduced to babies at about 6 months old, but it's essential to look for signs of readiness and speak with a healthcare professional.", "rag": "Introduce solid foods to your baby around the 6-month mark, but ensure they show signs of readiness and get advice from a pediatrician."}}
-{"question": "How can I help my newborn establish a sleep routine?", "answers": {"human": "Stick to a consistent bedtime routine, keep the room dark and quiet, and try to feed and change the baby before putting them down.", "gpt4": "Consistency is key. Develop a bedtime routine, reduce stimulation before sleep, and ensure they're well-fed and dry.", "rag": "Establishing a regular bedtime routine, reducing pre-sleep stimulation, and creating a calm environment can help."}}
-{"question": "Why does my newborn sneeze so much?", "answers": {"human": "Newborns sneeze to clear their nasal and respiratory passages. It's normal and not necessarily a sign of illness.", "gpt4": "Sneezing is common in newborns as they clear out lint, dust, and mucus from their noses.", "rag": "Newborns often sneeze to clear tiny particles from their noses. It's a natural reflex and not a cause for concern."}}
-{"question": "Can I take my newborn outside?", "answers": {"human": "Yes, but make sure to dress them appropriately for the weather and avoid direct sunlight. Avoid crowded places in the early weeks.", "gpt4": "Taking your newborn outside is fine, but ensure they're protected from the elements and avoid high traffic areas.", "rag": "It's good for newborns to get fresh air. Just ensure they're dressed properly and shielded from direct sun or extreme temperatures."}}
diff --git a/src/genai/parenting_chatbot/prodigy_eval/analyse_annotations.py b/src/genai/parenting_chatbot/prodigy_eval/analyse_annotations.py
deleted file mode 100644
index 8439262..0000000
--- a/src/genai/parenting_chatbot/prodigy_eval/analyse_annotations.py
+++ /dev/null
@@ -1,18 +0,0 @@
-import json
-
-
-# Load data from the exported .jsonl file
-with open("output.jsonl", "r") as file:
- annotations = [json.loads(line) for line in file]
-
-# Now you can process the annotations
-for entry in annotations:
- # Example: print the chosen answer for each question
- question = entry["text"]
- chosen_answer_id = entry["accept"][
- 0
- ] # 'accept' contains the IDs of the selected answers. Assuming single choice here.
-
- print(f"Question: {question}")
- print(f"Chosen Answer (ID): {chosen_answer_id}")
- print("-" * 40)
diff --git a/src/genai/parenting_chatbot/prodigy_eval/best_answer_recipe.py b/src/genai/parenting_chatbot/prodigy_eval/best_answer_recipe.py
index d5adbc7..42c3f5e 100644
--- a/src/genai/parenting_chatbot/prodigy_eval/best_answer_recipe.py
+++ b/src/genai/parenting_chatbot/prodigy_eval/best_answer_recipe.py
@@ -65,12 +65,17 @@ def format_stream(stream: List) -> Dict:
"choice_style": "single",
"task_description": "Choose the best answer",
"choice_auto_accept": False,
+ # Define which buttons to show
"buttons": ["accept", "ignore"],
+ # Add custom css
"global_css": GLOBAL_CSS,
+ # If feed_overlap is True, the same example can be sent out to multiple users at the same time
"feed_overlap": True,
+ # Port to run the server on
"port": 8080,
- # imporant to set host to 0.0.0.0 for running on ec2
+ # Important to set host to 0.0.0.0 when running on ec2
"host": "0.0.0.0",
+ # Setting instant_submit as True means that the user doesn't have to click the "save" button
"instant_submit": True,
},
}
diff --git a/src/genai/parenting_chatbot/prodigy_eval/create_eval_data.py b/src/genai/parenting_chatbot/prodigy_eval/create_eval_data.py
index cd483f3..4709008 100644
--- a/src/genai/parenting_chatbot/prodigy_eval/create_eval_data.py
+++ b/src/genai/parenting_chatbot/prodigy_eval/create_eval_data.py
@@ -14,14 +14,14 @@
import pandas as pd
-# Constants
+# Path constants
DATA_DIR = "src/genai/parenting_chatbot/prodigy_eval/data/"
QUESTION_FILE = DATA_DIR + "questions.jsonl"
ANSWER_FILE = DATA_DIR + "answers_{}.jsonl"
OUTPUT_FILE = DATA_DIR + "answers.jsonl"
-# Define answer types and load corresponding answers
+# Define answer types, for loading corresponding answers
ANSWER_TYPES = ["human", "rag", "gpt4"]
-# html formatting prefix and suffix for questions
+# html formatting of prefix and suffix for questions
QUESTION_PREFIX = "Which one is a better answer to this question:\n\n"
QUESTION_SUFFIX = ""
diff --git a/src/genai/parenting_chatbot/prodigy_eval/fetch_from_s3.py b/src/genai/parenting_chatbot/prodigy_eval/fetch_from_s3.py
deleted file mode 100644
index f1d9bb8..0000000
--- a/src/genai/parenting_chatbot/prodigy_eval/fetch_from_s3.py
+++ /dev/null
@@ -1,55 +0,0 @@
-import argparse
-import os
-
-import boto3
-
-from dotenv import load_dotenv
-
-
-parser = argparse.ArgumentParser(description="Names and locations of inputs and outputs")
-
-parser.add_argument(
- "--s3_path",
- type=str,
- dest="s3_path",
- help="Path to the file in the S3 bucket",
- default="prototypes/parenting-chatbot/prodigy_evaluation/prodigy_training_data.jsonl",
-)
-
-parser.add_argument(
- "--out_path",
- type=str,
- dest="out_path",
- help="Path to save the file locally",
- default="src/genai/parenting_chatbot/prodigy_eval/data",
-)
-
-parser.add_argument(
- "--out_name", type=str, dest="out_name", help="Name to save the file under", default="prodigy_training_data.jsonl"
-)
-
-args, unknown = parser.parse_known_args()
-
-load_dotenv()
-
-s3_path = os.environ["S3_BUCKET"]
-
-
-def fetch_from_s3(bucket_name: str, s3_path_to_file: str, local_path_to_file: str, filename: str) -> None:
- """Fetch data from the s3 bucket and store it locally.
-
- Args:
- bucket_name (str): Name of the bucket
- s3_path_to_file (str): Filepath including the name of the file in the bucket.
- local_path_to_file (str): Filepath indicating where you want to save the file locally.
- filename (str): Name of the file to save locally eg 'data.jsonl'.
- """
- s3 = boto3.client("s3")
- s3.download_file(bucket_name, s3_path_to_file, os.path.join(local_path_to_file, filename))
-
-
-if __name__ == "__main__":
- if not os.path.exists(args.out_path):
- os.makedirs(args.out_path)
-
- fetch_from_s3(s3_path, args.s3_path, args.out_path, args.out_name)
diff --git a/src/genai/parenting_chatbot/prodigy_eval/figures/eval_parenting_chatbot.png b/src/genai/parenting_chatbot/prodigy_eval/figures/eval_parenting_chatbot.png
new file mode 100644
index 0000000..baec4e6
Binary files /dev/null and b/src/genai/parenting_chatbot/prodigy_eval/figures/eval_parenting_chatbot.png differ
diff --git a/src/genai/parenting_chatbot/prodigy_eval/figures/prodigy_screenshot.png b/src/genai/parenting_chatbot/prodigy_eval/figures/prodigy_screenshot.png
new file mode 100644
index 0000000..32c260e
Binary files /dev/null and b/src/genai/parenting_chatbot/prodigy_eval/figures/prodigy_screenshot.png differ
diff --git a/src/genai/parenting_chatbot/prodigy_eval/test_recipe.py b/src/genai/parenting_chatbot/prodigy_eval/test_recipe.py
deleted file mode 100644
index 51bcdd1..0000000
--- a/src/genai/parenting_chatbot/prodigy_eval/test_recipe.py
+++ /dev/null
@@ -1,11 +0,0 @@
-import prodigy
-
-from prodigy.components.loaders import JSONL
-
-
-@prodigy.recipe("compare_strings")
-def compare_strings(dataset, input_file):
- stream = JSONL(input_file)
- html_template = "{{text1}}
{{text2}}" # format this however you want
-
- return {"dataset": dataset, "stream": stream, "view_id": "html", "config": {"html_template": html_template}}