Skip to content

Commit

Permalink
Update index.html
Browse files Browse the repository at this point in the history
  • Loading branch information
schwettmann authored Apr 13, 2024
1 parent b6668c4 commit 6d53d1e
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -104,8 +104,9 @@ <h1 class="title is-1 publication-title">A Multimodal Automated Interpretability
<section class="section" style="margin-top: -75px; margin-bottom:-10px">
<div class="container is-max-desktop">
<div style="text-align: justify;">
<p>Understanding an AI system can take many forms. For instance, we might want to know when and how the system relies on sensitive or spurious features, identify systematic errors in its predictions, or learn how to modify the training data and model architecture to improve accuracy and robustness. Today, answering these types of questions often involves significant effort on the part of researchers: synthesizing the outcomes of different experiments that use a variety of tools.</p><br>
<p><h3 class="title is-4">Can an interpretability agent automate the process of experimenting on a system to explain its behavhior?</h3></p>
<p><h3 class="title is-4">How can AI systems help us understand other AI systems?</h3></p>
<p>Understanding an AI system can take many forms. For instance, we might want to know when and how the system relies on sensitive or spurious features, identify systematic errors in its predictions, or learn how to modify the training data and model architecture to improve accuracy and robustness. Today, answering these types of questions often involves significant human effort—researchers must formalize their question, formulate hypotheses about a model’s decision-making process, design datasets on which to evaluate model behavior, then use these datasets to refine and validate hypotheses. As a result, this type of understanding is slow and expensive to obtain, even about the most widely used models.</p><br>
<p>Automated interpretability approaches begin to address hte scalability problem. But there are some issues: low-precision, primarily tools for hypothesis generation, characterization of behaivor on a limited set of inputs. </p>
</div>
</div>
</section>
Expand Down

0 comments on commit 6d53d1e

Please sign in to comment.