On the Universal Truthfulness Hyperplane Inside LLMs

Overview

This is the code implementation of the paper: [On the Universal Truthfulness Hyperplane Inside LLMs ]

In this paper, we examine whether a universal truthfulness hyperplane exists inside the model, through designing and training a probe on diverse datasets. Our approach greatly improves existing results and conveys positive signals on the existence of such a universal truthfulness hyperplane.

News

Our paper is accepted by EMNLP 2024!

Setup

conda env create -f environment.yml

Todo

Release data!
Clean and release training scripts

Citation

Please cite our paper if it's helpful to your work!

@article{liu2024universal,
  title={On the Universal Truthfulness Hyperplane Inside LLMs},
  author={Liu, Junteng and Chen, Shiqi and Cheng, Yu and He, Junxian},
  journal={arXiv preprint arXiv:2407.08582},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
figure		figure
src		src
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On the Universal Truthfulness Hyperplane Inside LLMs

Overview

News

Setup

Todo

Citation

About

Releases

Packages

Languages

hkust-nlp/Universal_Truthfulness_Hyperplane

Folders and files

Latest commit

History

Repository files navigation

On the Universal Truthfulness Hyperplane Inside LLMs

Overview

News

Setup

Todo

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages