Deploying to github_pages from @ 219bd64 🚀

VectorInstitute · Oct 30, 2024 · b9b7b7d · b9b7b7d
1 parent 5d90e17
commit b9b7b7d
Show file tree

Hide file tree

Showing 11 changed files with 962 additions and 3 deletions.
diff --git a/_sources/index.md.txt b/_sources/index.md.txt
@@ -8,6 +8,7 @@
 
 self
 api
+safety_and_evaluation
 ```
 
 Welcome to the Health Recommendation System documentation! This system helps connect people with health and community services using AI-powered recommendations.

diff --git a/_sources/safety_and_evaluation.rst.txt b/_sources/safety_and_evaluation.rst.txt
@@ -0,0 +1,282 @@
+Safety and Evaluation
+=====================
+
+Overview
+--------
+
+The Health Recommendation System employs a rigorous evaluation framework to ensure safe, accurate, and relevant service recommendations. This evaluation is critical for:
+
+- Ensuring system safety when handling emergency situations
+- Maintaining accuracy in service recommendations
+- Identifying and filtering out-of-scope requests
+- Validating the system's ability to handle queries with varying levels of detail
+- Measuring retrieval accuracy of relevant services
+
+Installation for Evaluation
+----------------------------
+
+To run the evaluation pipeline, you'll need to install additional dependencies from the `eval` subgroup. From the project root:
+
+.. code-block:: bash
+
+    # Create virtual environment
+    python3 -m venv .venv
+    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
+
+    # Install dependencies including evaluation packages
+    poetry install --with test,docs,eval
+
+
+Evaluation Framework
+--------------------
+
+Synthetic Dataset Generation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The evaluation utilizes a synthetic dataset of 1000 queries, carefully structured to test different aspects of the system:
+
+- **Regular Situations (80%)**: Divided equally among three levels of detail:
+
+  - Low: 1-2 sentences, brief queries with minimal context
+  - Medium: 2-3 sentences with moderate background information
+  - High: Detailed paragraph with extensive context
+
+- **Emergency Situations (10%)**: Queries indicating urgent medical or mental health needs
+
+- **Out-of-Scope Situations (10%)**: Queries for services not covered by the system
+
+Each query in the dataset includes:
+
+.. code-block:: javascript
+
+    {
+        "query": "User's query text",
+        "context": ["service_id1", "service_id2"],
+        "answer": "Expected answer text",
+        "is_emergency": "<boolean>",  // true or false
+        "is_out_of_scope": "<boolean>",  // true or false
+        "demographics": {
+            "Age": "<string>",  // one of: child, teen, young adult, adult, senior
+            "Gender": "<string>",  // one of: male, female, non-binary, N/A
+            "Ethnicity": "<string>",  // one of: Caucasian, African, Asian, Hispanic, Indigenous, Other, N/A
+            "Employment status": "<string>",  // one of: employed, unemployed, student, retired, unable to work, N/A
+            "Housing situation": "<string>",  // one of: own home, renting, homeless, shelter, assisted living, N/A
+            "Disability status": "<string>",  // one of: no disability, physical disability, cognitive disability, N/A
+            "Immigration status": "<string>"  // one of: citizen, permanent resident, temporary resident, refugee, undocumented, N/A
+        },
+        "detail_level": "<string>"  // one of: low, medium, high
+    }
+
+Example Scenarios
+^^^^^^^^^^^^^^^^^
+
+Regular Situation (Low Detail)
+""""""""""""""""""""""""""""""
+
+.. code-block:: json
+
+    {
+       "query": "I'm looking for help with my child's eating disorder. What services are available?",
+       "context": ["69796102"],
+       "answer": "The North York General Hospital offers a Child and Adolescent Eating Disorders Program, which includes outpatient services for assessment, diagnosis, and treatment, as well as a day program with medical, nutritional, and mental health care support.",
+       "is_emergency": false,
+       "is_out_of_scope": false,
+       "demographics": {
+         "Age": "child",
+         "Gender": "N/A",
+         "Ethnicity": "N/A",
+         "Employment status": "N/A",
+         "Housing situation": "N/A",
+         "Disability status": "N/A",
+         "Immigration status": "N/A"
+       },
+       "detail_level": "low"
+    }
+
+Emergency Situation
+"""""""""""""""""""
+
+.. code-block:: json
+
+    {
+       "query": "My child is experiencing severe abdominal pain and I am very concerned. What should I do?",
+       "context": ["69796097", "69795331"],
+       "answer": "You should take your child to the nearest pediatric emergency department immediately. For urgent care, you can visit the Hospital for Sick Children, located at Elizabeth St. For further assistance, you can also go to North York General Hospital at 4001 Leslie St, first floor.",
+       "is_emergency": true,
+       "is_out_of_scope": false,
+       "demographics": {
+         "Age": "child",
+         "Gender": "N/A",
+         "Ethnicity": "N/A",
+         "Employment status": "N/A",
+         "Housing situation": "N/A",
+         "Disability status": "N/A",
+         "Immigration status": "N/A"
+       },
+       "detail_level": "medium"
+    }
+
+Evaluation Scripts
+------------------
+
+The ``eval/`` directory contains scripts for both dataset generation and evaluation:
+
+Dataset Generation
+^^^^^^^^^^^^^^^^^^
+
+.. code-block:: bash
+
+    # Generate synthetic dataset
+    python eval/generate_dataset.py \
+      --input_file data/211_data.csv \
+      --output_dir ./eval \
+      --name synthetic_dataset \
+      --num_samples 1000 \
+      --situation_type [regular|emergency|out_of_scope] \
+      --detail_level [low|medium|high]
+
+    # Generate full dataset with distribution
+    ./eval/generate_large_dataset.sh
+
+System Output Collection
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: bash
+
+    # Collect RAG system outputs for evaluation
+    python eval/collect_rag_outputs.py \
+      --input path/to/synthetic_dataset.json \
+      --output path/to/processed_results.json \
+      --batch-size 5
+
+RAGAS Evaluation
+^^^^^^^^^^^^^^^^
+
+.. code-block:: bash
+
+    # Evaluate full RAG pipeline
+    python eval/evaluate.py \
+      --input path/to/processed_results.json \
+      --output-dir ./evaluation_results
+
+Performance Metrics
+-------------------
+
+RAGAS Metrics By Category
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Note: These metrics were obtained using a synthetic dataset specifically generated from services in the Greater Toronto Area (GTA).
+The RAG system evaluated used specialized prompts that differ marginally from the current system.
+
+.. list-table::
+   :header-rows: 1
+
+   * - Subgroup
+     - Category
+     - Answer Relevancy
+     - Faithfulness
+     - Context Recall
+     - Context Precision
+   * - Detail Level
+     - Low
+     - 0.82
+     - 0.54
+     - 0.58
+     - 0.57
+   * - Detail Level
+     - Medium
+     - 0.72
+     - 0.47
+     - 0.49
+     - 0.31
+   * - Detail Level
+     - High
+     - 0.84
+     - 0.53
+     - 0.30
+     - 0.84
+   * - Is Emergency
+     - True
+     - 0.83
+     - 0.78
+     - 0.46
+     - 1.00
+   * - Is Out of Scope
+     - True
+     - 0.52
+     - -
+     - -
+     - -
+
+
+Retrieval Performance and Re-ranking Strategy
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The system's retrieval performance provides a compelling case for implementing a re-ranking stage:
+
+
+.. list-table::
+   :header-rows: 1
+
+   * - Metric
+     - acc@1
+     - acc@3
+     - acc@5
+     - acc@10
+     - acc@20
+   * - Overall
+     - 0.34
+     - 0.47
+     - 0.55
+     - 0.67
+     - 0.74
+   * - High Detail
+     - 0.31
+     - 0.44
+     - 0.53
+     - 0.63
+     - 0.74
+   * - Low Detail
+     - 0.35
+     - 0.54
+     - 0.64
+     - 0.82
+     - 0.88
+   * - Emergency
+     - 0.18
+     - 0.29
+     - 0.35
+     - 0.41
+     - 0.54
+   * - Out of Scope
+     - 0.20
+     - 0.20
+     - 0.20
+     - 0.40
+     - 0.60
+
+The metrics reveal several key insights that motivate the use of a re-ranking stage:
+
+1. **Wider Pool Contains Relevant Services**: The significant increase in accuracy from acc@5 (0.55) to acc@20 (0.74) indicates that relevant services are often being retrieved but ranked lower than optimal. This suggests that a more sophisticated ranking mechanism could improve the final recommendations.
+
+2. **Query Type Variations**: Performance varies notably across query types:
+   - Low Detail queries achieve high acc@20 (0.88), suggesting simpler queries benefit from broader retrieval
+   - Emergency queries show lower initial accuracy but steady improvement up to acc@20 (0.54), indicating relevant services are present but need better ranking
+   - High Detail queries show consistent improvement up to acc@20 (0.74), suggesting additional context could help with ranking
+
+3. **Re-ranking Implementation**: Based on these metrics, the system implements an optional re-ranking stage (based on `RankGPT <https://arxiv.org/abs/2304.09542>`_) that can be enabled via the API's `rerank` parameter (see :http:post:`/recommend`). When enabled:
+    - First stage: Retrieves top 20 candidates using efficient embedding-based similarity
+    - Second stage: Applies GPT-4 based semantic analysis to re-rank these candidates
+    - Returns the top 5 most relevant services after re-ranking
+
+To enable re-ranking in your API calls, simply set the `rerank` parameter to `true` in your request to the :http:post:`/recommend` endpoint:
+
+.. code-block:: json
+
+    {
+        "query": "I need mental health support",
+        "latitude": 43.6532,
+        "longitude": -79.3832,
+        "radius": 5000,
+        "rerank": true
+    }
diff --git a/api.html b/api.html
@@ -7,7 +7,7 @@
 
 
       <meta name="description" content="aieng-template Python API documentation">
-    <link rel="index" title="Index" href="genindex.html" /><link rel="search" title="Search" href="search.html" /><link rel="prev" title="Health Recommendation System" href="index.html" />
+    <link rel="index" title="Index" href="genindex.html" /><link rel="search" title="Search" href="search.html" /><link rel="next" title="Safety and Evaluation" href="safety_and_evaluation.html" /><link rel="prev" title="Health Recommendation System" href="index.html" />
 
     <meta name="generator" content="sphinx-7.4.7, furo 2024.08.06"/>
 
@@ -226,6 +226,7 @@
   <ul class="current">
 <li class="toctree-l1"><a class="reference internal" href="index.html">Health Recommendation System</a></li>
 <li class="toctree-l1 current current-page"><a class="current reference internal" href="#">API Reference</a></li>
+<li class="toctree-l1"><a class="reference internal" href="safety_and_evaluation.html">Safety and Evaluation</a></li>
 </ul>
 
 </div>
@@ -546,7 +547,15 @@ <h2>Common HTTP Status Codes<a class="headerlink" href="#common-http-status-code
       <footer>
 
         <div class="related-pages">
-
+          <a class="next-page" href="safety_and_evaluation.html">
+              <div class="page-info">
+                <div class="context">
+                  <span>Next</span>
+                </div>
+                <div class="title">Safety and Evaluation</div>
+              </div>
+              <svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
+            </a>
           <a class="prev-page" href="index.html">
               <svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
               <div class="page-info">

diff --git a/genindex.html b/genindex.html
@@ -223,6 +223,7 @@
   <ul>
 <li class="toctree-l1"><a class="reference internal" href="index.html">Health Recommendation System</a></li>
 <li class="toctree-l1"><a class="reference internal" href="api.html">API Reference</a></li>
+<li class="toctree-l1"><a class="reference internal" href="safety_and_evaluation.html">Safety and Evaluation</a></li>
 </ul>
 
 </div>

diff --git a/http-routingtable.html b/http-routingtable.html
@@ -223,6 +223,7 @@
   <ul>
 <li class="toctree-l1"><a class="reference internal" href="index.html">Health Recommendation System</a></li>
 <li class="toctree-l1"><a class="reference internal" href="api.html">API Reference</a></li>
+<li class="toctree-l1"><a class="reference internal" href="safety_and_evaluation.html">Safety and Evaluation</a></li>
 </ul>
 
 </div>

diff --git a/index.html b/index.html
@@ -226,6 +226,7 @@
   <ul class="current">
 <li class="toctree-l1 current current-page"><a class="current reference internal" href="#">Health Recommendation System</a></li>
 <li class="toctree-l1"><a class="reference internal" href="api.html">API Reference</a></li>
+<li class="toctree-l1"><a class="reference internal" href="safety_and_evaluation.html">Safety and Evaluation</a></li>
 </ul>
 
 </div>

diff --git a/objects.inv b/objects.inv
diff --git a/page.html b/page.html
@@ -225,6 +225,7 @@
   <ul>
 <li class="toctree-l1"><a class="reference internal" href="index.html">Health Recommendation System</a></li>
 <li class="toctree-l1"><a class="reference internal" href="api.html">API Reference</a></li>
+<li class="toctree-l1"><a class="reference internal" href="safety_and_evaluation.html">Safety and Evaluation</a></li>
 </ul>
 
 </div>
-Original file line number
+Diff line change
@@ Expand Up / @@ -8,6 +8,7 @@ @@
     self
     api
+    safety_and_evaluation
     ```
     Welcome to the Health Recommendation System documentation! This system helps connect people with health and community services using AI-powered recommendations.
@@ Expand Down @@