This repository contains data and code for the paper Measuring and Addressing Indexical Bias in Information Retrieval. For more information, please reach out to the authors:
Caleb Ziems |
William Held |
Jane Dwivedi-Yu |
Diyi Yang |
🧑🤝🧑 PAIR
is designed to help you identify and mitigate indexical biases in your IR systems. 🧑🤝🧑 PAIR
includes a set of evaluation metrics, data resources, and human subjects study interfaces that help you measure and experimentally understand the Search Engine Manipulation Effect.
$ git clone https://github.com/SALT-NLP/pair.git
$ cd pair
$ conda create -n pair python=3.9.16
$ conda activate pair
$ pip install -r requirements.txt
You can run this example in the Demo.ipynb
jupyter notebook.
from src.metrics.duo import Duo, get_relevant_corpus, get_relevant_corpus_retrieved, get_relevant_ranking
from src.utils import load_wiki_balance
from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES
from beir.retrieval.models import SentenceBERT
from beir.retrieval.evaluation import EvaluateRetrieval
# ----- RETRIEVAL -----
## load the WikiBias_Natural retrieval corpus
corpus, queries, qrels = load_wiki_balance(subset='natural')
## load an IR model from BEIR
retriever = EvaluateRetrieval(DRES(SentenceBERT("msmarco-distilbert-base-tas-b"), batch_size=16))
## retrieve documents
retrieved = retriever.retrieve(corpus, queries)
from src.metrics.duo import Duo, get_relevant_corpus, get_relevant_corpus_retrieved, get_relevant_ranking
# ----- INDEXICAL BIAS EVALUATION -----
## initialize the metric
d = Duo(embedding_model="sentence-t5-xl", step_size=1, random_state=7)
## load the synthetic corpus for fitting the Duo metric
fit_corpus, fit_queries, fit_qrels = load_wiki_balance(subset='synthetic')
## evaluate on the first query
query_idx = list(retrieved.keys())[0]
## embed documents to polarization scores
d.embed(transform_docs=get_relevant_corpus_retrieved(corpus, retrieved, query_idx, qrels),
fit_docs=get_relevant_corpus(fit_corpus, query_idx, fit_qrels),
)
# compute DUO score
duo_score = d.Duo(ranking=get_relevant_ranking(retrieved, query_idx, qrels))
print(duo_score)
You can view the WikiBalance datasets on Hugging Face.
Dataset | Huggingface Name | Gold Labels | Type | Topics | Queries | Documents |
---|---|---|---|---|---|---|
WikiBalance Synthetic | SALT-NLP/wiki-balance-synthetic |
❌ | test |
1.4k | 4k | 31.5k |
WikiBalance Natural | SALT-NLP/wiki-balance-natural |
✅ | test |
288 | 452 | 4.6k |
You can replicate all system audits from Tables 4 and 5 in the paper by running the following script:
bash run_audit.sh
Only BM-25 and ColBERT require special setup to run. To set up ColBERT, follow the (BEIR demo instructions here)[https://github.com/beir-cellar/beir/tree/main/examples/retrieval/evaluation/late-interaction]. To run BM-25, use the following steps:
On Mac
- Download
elasticsearch.zip
and unpack locally: elastic.co/downloads/elasticsearch - Edit
config/elasticsearch.yml
to remove security features, settingfalse
toxpack.security.enabled
,xpack.security.http.ssl.enabled
,xpack.security.transport.ssl.enabled
- Move to the elasticsearch directory and run elasticsearch
bin/elasticsearch
- Run using
python -m src.modeling.run_bm25 --dataset "idea/wiki" --model "bm25"
On Linux Follow these instructions: linuxize.com/post/how-to-install-elasticsearch-on-ubuntu-18-04
- To print the summary tables from the paper, run
print_tables.py
from the main directory. - To replicate our metric validations in Table 2 (as well as Tables 6 and 7 in the Appendix), run
python -m src.experiments.metric_validation
- To replicate the SEME experiments, you can do the following:
a. Re-run the experiments with your own participants using the HIT interface,
hit/seme/hit_pair_seme.html
OR b. Download the experimental data from (this Drive link)[https://drive.google.com/file/d/1TXKZueZFo_VbzMyui-V5YkQVvixysQuA/view?usp=drive_link] and place it in thehit/seme
directory. c. Runpython -m src.experiments.seme_experiment.py