Merge branch 'main' into dev

OpenMOSS · Aug 24, 2024 · 6802152 · 6802152
2 parents 3f2d1b8 + 05e13f6
commit 6802152
Show file tree

Hide file tree

Showing 94 changed files with 10,741 additions and 1,990 deletions.
diff --git a/TransformerLens/.github/workflows/checks.yml b/TransformerLens/.github/workflows/checks.yml
@@ -44,7 +44,6 @@ jobs:
     strategy:
       matrix:
         python-version:
-          - "3.8"
           - "3.9"
           - "3.10"
     steps:
@@ -123,6 +122,7 @@ jobs:
         notebook:
           # - "Activation_Patching_in_TL_Demo"
           # - "Attribution_Patching_Demo"
+          - "ARENA_Content"
           - "BERT"
           - "Exploratory_Analysis_Demo"
           # - "Grokking_Demo"
@@ -133,6 +133,8 @@ jobs:
           - "Main_Demo"
           # - "No_Position_Experiment"
           - "Othello_GPT"
+          - "Patchscopes_Generation_Demo"
+          # - "T5"
     steps:
       - uses: actions/checkout@v3
       - name: Install Poetry

diff --git a/TransformerLens/README.md b/TransformerLens/README.md
@@ -10,20 +10,11 @@ CD](https://github.com/TransformerLensOrg/TransformerLens/actions/workflows/chec
 [![Docs
 CD](https://github.com/TransformerLensOrg/TransformerLens/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/TransformerLensOrg/TransformerLens/actions/workflows/pages/pages-build-deployment)
 
-A Library for Mechanistic Interpretability of Generative Language Models.
+A Library for Mechanistic Interpretability of Generative Language Models. Maintained by [Bryce Meyer](https://github.com/bryce13950) and created by [Neel Nanda](https://neelnanda.io/about)
 
 [![Read the Docs
 Here](https://img.shields.io/badge/-Read%20the%20Docs%20Here-blue?style=for-the-badge&logo=Read-the-Docs&logoColor=white&link=https://TransformerLensOrg.github.io/TransformerLens/)](https://TransformerLensOrg.github.io/TransformerLens/)
 
-| :exclamation:  HookedSAETransformer Removed   |
-|-----------------------------------------------|
-
-Hooked SAE has been removed from TransformerLens 2.0. The functionality is being moved to
-[SAELens](http://github.com/jbloomAus/SAELens). For more information on this release, please see the
-accompanying
-[announcement](https://transformerlensorg.github.io/TransformerLens/content/news/release-2.0.html)
-for details on what's new, and the future of TransformerLens.
-
 This is a library for doing [mechanistic
 interpretability](https://distill.pub/2020/circuits/zoom-in/) of GPT-2 Style language models. The
 goal of mechanistic interpretability is to take a trained model and reverse engineer the algorithms
@@ -56,7 +47,7 @@ logits, activations = model.run_with_cache("Hello World")
 ## Key Tutorials
 
 * [Introduction to the Library and Mech
-  Interp](https://arena-ch1-transformers.streamlit.app/[1.2]_Intro_to_Mech_Interp)
+  Interp](https://arena3-chapter1-transformer-interp.streamlit.app/[1.2]_Intro_to_Mech_Interp)
 * [Demo of Main TransformerLens Features](https://neelnanda.io/transformer-lens-demo)
 
 ## Gallery
@@ -111,20 +102,20 @@ you would like to help, please try working on one! The standard answer to "why h
 yet" is just that there aren't enough people! Key resources:
 
 * [A Guide to Getting Started in Mechanistic Interpretability](https://neelnanda.io/getting-started)
-* [ARENA Mechanistic Interpretability Tutorials](https://arena-ch1-transformers.streamlit.app/) from
+* [ARENA Mechanistic Interpretability Tutorials](https://arena3-chapter1-transformer-interp.streamlit.app/) from
   Callum McDougall. A comprehensive practical introduction to mech interp, written in
   TransformerLens - full of snippets to copy and they come with exercises and solutions! Notable
   tutorials:
   * [Coding GPT-2 from
-    scratch](https://arena-ch1-transformers.streamlit.app/[1.1]_Transformer_from_Scratch), with
+    scratch](https://arena3-chapter1-transformer-interp.streamlit.app/[1.1]_Transformer_from_Scratch), with
     accompanying video tutorial from me ([1](https://neelnanda.io/transformer-tutorial)
     [2](https://neelnanda.io/transformer-tutorial-2)) - a good introduction to transformers
   * [Introduction to Mech Interp and
-    TransformerLens](https://arena-ch1-transformers.streamlit.app/[1.2]_Intro_to_Mech_Interp): An
+    TransformerLens](https://arena3-chapter1-transformer-interp.streamlit.app/[1.2]_Intro_to_Mech_Interp): An
     introduction to TransformerLens and mech interp via studying induction heads. Covers the
     foundational concepts of the library
   * [Indirect Object
-    Identification](https://arena-ch1-transformers.streamlit.app/[1.3]_Indirect_Object_Identification):
+    Identification](https://arena3-chapter1-transformer-interp.streamlit.app/[1.3]_Indirect_Object_Identification):
     a replication of interpretability in the wild, that covers standard techniques in mech interp
     such as [direct logit
     attribution](https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=disz2gTx-jooAcR0a5r8e7LZ),
@@ -156,10 +147,18 @@ discussions about eg supporting important new use cases, or if you want to make
 contributions to the library and want a maintainer's opinion. We'd also love for you to come and
 share your projects on the Slack!
 
+| :exclamation:  HookedSAETransformer Removed   |
+|-----------------------------------------------|
+
+Hooked SAE has been removed from TransformerLens in version 2.0. The functionality is being moved to
+[SAELens](http://github.com/jbloomAus/SAELens). For more information on this release, please see the
+accompanying
+[announcement](https://transformerlensorg.github.io/TransformerLens/content/news/release-2.0.html)
+for details on what's new, and the future of TransformerLens.
+
 ## Credits
 
-This library was created by **[Neel Nanda](https://neelnanda.io)** and is maintained by **Joseph
-Bloom**.
+This library was created by **[Neel Nanda](https://neelnanda.io)** and is maintained by **[Bryce Meyer](https://github.com/bryce13950)**.
 
 The core features of TransformerLens were heavily inspired by the interface to [Anthropic's
 excellent Garcon tool](https://transformer-circuits.pub/2021/garcon/index.html). Credit to Nelson