Skip to content

On the Universal Truthfulness Hyperplane Inside LLMs (EMNLP 2024)

Notifications You must be signed in to change notification settings

hkust-nlp/Universal_Truthfulness_Hyperplane

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

On the Universal Truthfulness Hyperplane Inside LLMs

Overview

This is the code implementation of the paper: [On the Universal Truthfulness Hyperplane Inside LLMs ]

In this paper, we examine whether a universal truthfulness hyperplane exists inside the model, through designing and training a probe on diverse datasets. Our approach greatly improves existing results and conveys positive signals on the existence of such a universal truthfulness hyperplane.

News

Our paper is accepted by EMNLP 2024!

Setup

conda env create -f environment.yml  

Todo

  • Release data!

  • Clean and release training scripts

Citation

Please cite our paper if it's helpful to your work!

@article{liu2024universal,
  title={On the Universal Truthfulness Hyperplane Inside LLMs},
  author={Liu, Junteng and Chen, Shiqi and Cheng, Yu and He, Junxian},
  journal={arXiv preprint arXiv:2407.08582},
  year={2024}
}

About

On the Universal Truthfulness Hyperplane Inside LLMs (EMNLP 2024)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages