Gradients Analysis

Created by:

Diego Sabajo: https://www.linkedin.com/in/diego-sabajo
Eitan Sprejer: https://www.linkedin.com/in/eitan-sprejer-574380204
Matias Zabaljauregui: https://www.linkedin.com/in/zabaljauregui
Oliver Morris: https://www.linkedin.com/in/olimoz

Overview

Explainable Diagnostic Assistant is an interactive tool designed to refine and analyze model outputs in a transparent and interpretable way. Built on top of Goodfire's API for advanced model evaluation and steering, this project focuses on improving trust and usability in AI systems, particularly in healthcare diagnostics.

Objectives:

Find features that reduce hallucinations
Build a medical hallucinations classifier that could also provide explainable results.
Build a framework that demonstrates the use of feature steering to reduce the hallucination rate, while maintaining accuracy.

Technologies and Frameworks:

Goodfire API:
Python Libraries: NumPy, Pandas, Matplotlib.
Streamlit:

Installation

Prerequisites

Ensure you have the following installed:

Python 3.8 or higher
pip (Python package installer)

Setting Up the Environment

Clone the repository

cd project-folder
git clone https://github.com/Mechanistic-Interpretability-Hackathon/Mech-Interp.git
cd Mech-Interp

Create virtual environment
```
python3 -m venv .venv
```

Activate virtual environment

Mac: python3 -m venv .venv
Windows: .venv\Scripts\activate

Install necessary packages
```
pip install -r requirements.txt
```

Run the project

```bash
streamlit run src/app.py

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.vscode		.vscode
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_baseline_hallucination.sh		run_baseline_hallucination.sh
run_feature_steering_small.sh		run_feature_steering_small.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gradients Analysis

Overview

Objectives:

Technologies and Frameworks:

Installation

Prerequisites

Setting Up the Environment

Run the project

About

Releases

Packages

Contributors 3

Languages

Mechanistic-Interpretability-Hackathon/Hallucination-Robustness-Medical-Q-A

Folders and files

Latest commit

History

Repository files navigation

Gradients Analysis

Overview

Objectives:

Technologies and Frameworks:

Installation

Prerequisites

Setting Up the Environment

Run the project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages