Automated Authorship Obfuscation Detection

Authorship attribution aims to identify the author of a text based on stylometric analysis. Authorship obfuscation, on the other hand, aims to protect against authorship attribution by modifying a text’s style. In this paper, we evaluate the stealthiness of state-of-the-art authorship obfuscation approaches using neural language models in an adversarial setting. An obfuscator is stealthy to the extent that an adversary finds it challenging to detect whether its output document is original or not - a decision that is key to an adversary. We show that the leading authorship obfuscation approaches are not stealthy as the output documents can be identified with average F1 score of 0.871. The reason for this weakness is that the obfuscators degrade text smoothness in a predictable i.e., detectable manner. Our results highlight the need to develop stealthy authorship obfuscation approaches that better protect an author seeking anonymity.

Demo

Getting Started

The following instructions will get you a copy of the project up and running on your local machine for testing purposes.

Prerequisites

To run this project you should have a disk space of 17GB and python 3 installed.

Installing

Following are the steps you need to follow to successfully run this.

First clone this repo

git clone https://github.com/asad1996172/Obfuscation-Detection

Then install all required libraries

pip3 install -r requirements.txt

Then download pre-trained models and GPT-2 345M model and put them in the working directory. These folders go by the name 'models' and 'output'

To download models, use the following link

https://www.dropbox.com/s/nikke8387y9smtp/models.zip?dl=0

To download output, use the following link

https://www.dropbox.com/s/5gputd5v1apfjkb/output.zip?dl=0

Running the tool

In order to run the tool, simply run the following command

python3 app.py

Resources

Following are the three automated authorship obfsucation systems used for experiments in our paper.

A Girl Has No Name: Automated Authorship Obfuscation using Mutant-X [paper] [code]
The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation [paper] [code]
Author Masking by Sentence Transformation [paper] [code]

Following are the URLs for data used in Obfuscation Detection experiments.

EBG dataset: https://www.dropbox.com/sh/snnowhyjo1awtfu/AACAurUwthDFkjKOdNUvttwRa?dl=0
For BLOGs dataset: https://www.dropbox.com/sh/qst55smvaktsfy7/AADgwh6J324Rk1CgtlFQGmAfa?dl=0

Built With

Flask - The web framework used
Scikit Learn - Machine Learning library used

Acknowledgments

Sb Admin 2 - Used this template for Front-end
GLTR - Code for GPT-2 and BERT used as Language Models

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.idea		.idea
static		static
templates		templates
.gitignore		.gitignore
GLTR.py		GLTR.py
README.md		README.md
all_results.csv		all_results.csv
app.py		app.py
final_demo.gif		final_demo.gif
predict.py		predict.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Authorship Obfuscation Detection

Demo

Getting Started

Prerequisites

Installing

Running the tool

Resources

Built With

Acknowledgments

About

Releases

Packages

Languages

asad1996172/Obfuscation-Detection

Folders and files

Latest commit

History

Repository files navigation

Automated Authorship Obfuscation Detection

Demo

Getting Started

Prerequisites

Installing

Running the tool

Resources

Built With

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages