A lightly modified version of facebookresearch/llama that allows for saving the intermediate activations in order to run CCS.
Some experiments for CS 229br Foundations of Deep Learning as taught in Spring 2023 at Harvard University by Boaz Barak (and teaching fellows Gustaf Ahdritz and Gal Kaplun).
Make sure you have CUDA available.
pip install -r requirements.txt
To generate the BoolQ prompts, run
python generate_dataset.py ./data/boolq/prompts.csv --tokenizer_path $TARGET_FOLDER/tokenizer.model
To evaluate a LLaMA model on the saved dataset, set the variables accordingly and run
torchrun --nproc_per_node $MP example.py --ckpt_dir $TARGET_FOLDER/$MODEL_SIZE --tokenizer_path $TARGET_FOLDER/tokenizer.model --save_activations_path ./data/boolq --prompt_csv ./data/prompts.csv
Different models require different MP values:
Model | MP |
---|---|
7B | 1 |
13B | 2 |
33B | 4 |
65B | 8 |
Discovering Latent Knowledge, in Language Models Without Supervision: The original paper by Collin Burns and Haotian Ye et al that proposes "Contrast-Consistent Search" (CCS).
- collin-burns/discovering_latent_knowledge: The corresponding repository.
- This is claimed to be quite buggy. See Bugs of the Initial Release of CCS by Fabien Roger.
- How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
What Discovering Latent Knowledge Did and Did Not Find: A writeup by Fabien Roger on takeaways from the original paper.
- safer-ai/Exhaustive-CCS: The corresponding repository. Similar to Collin Burns's but with fewer bugs.
- Several experiments with CCS.
EleutherAI/elk: Contains many further innovations on top of CCS.