dsp_ai_eval

Setup

Meet the data science cookiecutter requirements, in brief:
- Install: direnv and conda
Run make install to configure the development environment:
- Setup the conda environment
- Configure pre-commit
Make sure you have a .env file with the following keys:

OPENAI_API_KEY = 'YOUR-KEY-HERE'

Key hyperparameters are stored in dsp_ai_eval/config/base.yaml. This file contains: the research question for the project; hyperparameters for the topic modelling and GPT prompting; plus paths to relevant files in the S3 bucket.
dsp_ai_eval/getters/ contains functions for obtaining different raw and processed datasets and artefacts. This is a Nesta DS convention.
There are three pipelines in dsp_ai_eval/pipeline/:
- generate_themes_with_gpt/: pipeline for obtaining repeated GPT answers to the research question.
- process_abstracts/: pipeline for performing text clustering on research abstracts
- process_gpt_summaries/: pipeline for performing text clustering on the summaries obtained with the generate_themes_with_gpt/ pipeline

At the moment the workflow is not fully reproducible - that work is forthcoming!

To update the pipeline to work with new research abstracts:

Download your own research abstracts and upload to the s3 bucket
Update relevant paths to your data in dsp_ai_eval/config/base.yaml
Update the getters in dsp_ai_eval/getters/scite.py (you may have more or fewer data files to be concatenated than in the previous iteration of this project)
Update the data deduplication steps in dsp_ai_eval/pipeline/process_abstracts/embed_scite_abstracts.py.

If you wish to repeat the experiment prompting GPT repeatedly for answers to a research question:

Update the RQ in dsp_ai_eval/config/base.yaml
Update the filepaths under gpt_themes_pipeline in dsp_ai_eval/config/base.yaml as desired
You should now be able to run dsp_ai_eval/pipeline/generate_themes_with_gpt/ask_gpt+for_themes.py

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.cookiecutter		.cookiecutter
.github		.github
docs		docs
dsp_ai_eval		dsp_ai_eval
outputs		outputs
.dockerignore		.dockerignore
.envrc		.envrc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
environment.yaml		environment.yaml
jupytext.toml		jupytext.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py