Skip to content

Commit

Permalink
Update index.html
Browse files Browse the repository at this point in the history
  • Loading branch information
schwettmann authored Apr 14, 2024
1 parent 5a7101c commit b4b611f
Showing 1 changed file with 29 additions and 2 deletions.
31 changes: 29 additions & 2 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ <h1 class="title is-1 publication-title">A Multimodal Automated Interpretability
<section class="section" style="margin-top: -75px; margin-bottom:-10px">
<div class="container is-max-desktop">
<div style="text-align: justify;">
<p><h3 class="title is-4">How can AI systems help us understand other AI systems?</h3></p>
<p><h3 class="title is-4 has-text-centered">How can AI systems help us understand other AI systems?</h3></p>
<p>Understanding an AI system can take many forms. For instance, we might want to know when and how the system relies on sensitive or spurious features, identify systematic errors in its predictions, or learn how to modify the training data and model architecture to improve accuracy and robustness. Today, answering these types of questions often involves significant human effort—researchers must formalize their question, formulate hypotheses about a model’s decision-making process, design datasets on which to evaluate model behavior, then use these datasets to refine and validate hypotheses. As a result, this type of understanding is slow and expensive to obtain, even about the most widely used models.</p><br>
<p><em>Automated Interpretability</em> approaches have begun to address the scalability problem. Recently, such approaches have used pretrained language models like GPT-4 (in Bills et al. 2023) or Claude (in Bricken et al. 2023) to generate explanations of features. In earlier work, we introduced MILAN (Hernandez et al. 2022), a captioner-like model trained on human feature annotations that takes as input a feature visualization and outputs a description of the feature’s functionality based on the visualization. But automated approaches that use learned models to label features leave something to be desired: they are primarily tools for hypothesis generation (Huang et al. 2023), they characterize behavior on a limited set of inputs, and they are often low precision.</p><br>
<p> Our current line of research aims to build tools that help users understand models, while combining the flexibility of human experimentation with the scalability of automated techniques. We take an approach based on automating scientific experimentation on models -- describe processes underlying data they generate. In Schwettmann et al. 2023, we introduced the interactive <em>Automated Interpretability Agent</em> paradigm, where LM-based agent interactively probe systems to explain their behavior.Vision-language backbone and a sophisitcated experiments on other systems (see many more examples in our <b>neuron viewer</b>).</p>
Expand Down Expand Up @@ -157,7 +157,7 @@ <h2 class="title is-3 has-text-centered">MAIA</h2>
<section class="hero teaser" style="margin-top: -5px;">
<div class="container is-max-desktop">
<div class="hero-body">
<h2 class="title is-3 has-text-centered">MAIA Tools</h2>
<h2 class="title is-3 has-text-centered">MAIA uses tools to design experiments on other systems</h2>
<div class="content" style="text-align: left;">
<p>MAIA composes interpretability subroutines into python programs to answer user queries about a system. What kind of experiments does MAIA design? Below we highlight example usage of individual tools to run experiments on neurons inside common vision architectures (CLIP, ResNet, DINO). These are experimental excerpts intended to demonstrate tool use (often, MAIA runs many more experiments to reach its final conclusion!) For full experiment logs, check out our interactive [neuron viewer]. </p>
</div>
Expand All @@ -184,8 +184,11 @@ <h3 class="subtitle is-5" style="text-align: left;">Image editing</h3>
</div>

</div>
<hr>
</div>
</section>



<section class="section" id="BibTeX">
<div class="container is-max-desktop content">
Expand All @@ -200,6 +203,30 @@ <h2 class="title">BibTeX</h2>
</div>
</section>

<section class="hero teaser" style="margin-top: -5px;">
<div class="container is-max-desktop">
<div class="hero-body">
<h2 class="title is-3 has-text-centered">MAIA uses tools to design experiments on other systems</h2>
<div class="content" style="text-align: left;">
<p>MAIA composes interpretability subroutines into python programs to answer user queries about a system. What kind of experiments does MAIA design? Below we highlight example usage of individual tools to run experiments on neurons inside common vision architectures (CLIP, ResNet, DINO). These are experimental excerpts intended to demonstrate tool use (often, MAIA runs many more experiments to reach its final conclusion!) For full experiment logs, check out our interactive [neuron viewer]. </p>
</div>
</div>
<hr>
</div>
</section>

<section class="hero teaser" style="margin-top: -5px;">
<div class="container is-max-desktop">
<div class="hero-body">
<h2 class="title is-3 has-text-centered">Validating MAIA explanations</h2>
<div class="content" style="text-align: left;">
<p>MAIA composes interpretability subroutines into python programs to answer user queries about a system. What kind of experiments does MAIA design? Below we highlight example usage of individual tools to run experiments on neurons inside common vision architectures (CLIP, ResNet, DINO). These are experimental excerpts intended to demonstrate tool use (often, MAIA runs many more experiments to reach its final conclusion!) For full experiment logs, check out our interactive [neuron viewer]. </p>
</div>
</div>
</div>
</section>



<footer class="footer">
<div class="columns is-centered">
Expand Down

0 comments on commit b4b611f

Please sign in to comment.