Presidio - Data Protection and Anonymization API

Context aware, pluggable and customizable PII anonymization service for text and images.

❗ Note: As we are in the process of defining the roadmap for Presidio, we will only accept PRs with bug fixes and no new features in the upcoming months.

What is Presidio

Presidio (Origin from Latin praesidium ‘protection, garrison’) helps to ensure sensitive text is properly managed and governed. It provides fast analytics and anonymization for sensitive text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers and financial data. Presidio analyzes the text using predefined or custom recognizers to identify entities, patterns, formats, and checksums with relevant context. Presidio leverages docker and kubernetes for workloads at scale.

Presidio can be integrated into any data pipeline for intelligent PII scrubbing. It is open-source, transparent and scalable. Additionally, PII anonymization use-cases often require a different set of PII entities to be detected, some of which are domain or business specific. Presidio allows you to customize or add new PII recognizers via API or code to best fit your anonymization needs.

⚠️ Presidio can help identify sensitive/PII data in un/structured text. However, because Presidio is using trained ML models, there is no guarantee that Presidio will find all sensitive information. Consequently, additional systems and protections should be employed.

Demo

Try Presidio with your own data

Overview

Presidio API

API Spec - available APIs, request and response formats.

Presidio REST API Open API Spec

API Samples

Learn more

More information can be found in Presidio Documentation

Deploying Presidio on a Kubernetes Cluster

Follow the Deployment Guidelines for details:

Developing Presidio

Deploy Presidio for Test and Dev

Current input/output components status

Module	Feature	Status
API	HTTP input	✅
Scanner	MySQL	❌
Scanner	MSSQL	❌
Scanner	PostgreSQL	❌
Scanner	Oracle	❌
Scanner	Azure Blob Storage	✅
Scanner	S3	✅
Scanner	Google Cloud Storage	❌
Streams	Kafka	✅
Streams	Azure Event Hub	✅
Datasink (output)	MySQL	✅
Datasink (output)	MSSQL	✅
Datasink (output)	Oracle	❌
Datasink (output)	PostgreSQL	✅
Datasink (output)	Kafka	✅
Datasink (output)	Azure Event Hub	✅
Datasink (output)	Azure Blob Storage	✅
Datasink (output)	S3	✅
Datasink (output)	Google Cloud Storage	❌

✅ - Working
🔶 - Partially supported (alpha)
❌ - Not supported yet

How to contact us?

If you have a usage question, found a bug or have a suggestion for improvement, please file a Github issue. For other matters, please email [email protected]

Contributing

For details on contributing to this repository, see the contributing guide.

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Name		Name	Last commit message	Last commit date
Latest commit History 300 Commits
.github		.github
charts/presidio		charts/presidio
deployment		deployment
docs		docs
functional-tests		functional-tests
pipelines		pipelines
pkg		pkg
presctl		presctl
presidio-analyzer		presidio-analyzer
presidio-anonymizer-image		presidio-anonymizer-image
presidio-anonymizer		presidio-anonymizer
presidio-api		presidio-api
presidio-collector		presidio-collector
presidio-datasink		presidio-datasink
presidio-ocr		presidio-ocr
presidio-recognizers-store		presidio-recognizers-store
presidio-scheduler		presidio-scheduler
presidio-tester		presidio-tester
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
AUTHORS		AUTHORS
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.golang.base		Dockerfile.golang.base
Dockerfile.golang.deps		Dockerfile.golang.deps
Dockerfile.python.deps		Dockerfile.python.deps
Gopkg.lock		Gopkg.lock
Gopkg.toml		Gopkg.toml
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.MD		README.MD
SECURITY.MD		SECURITY.MD
VERSION		VERSION
azure-pipelines.yml		azure-pipelines.yml
build.sh		build.sh
gometalinter.json		gometalinter.json
pytest.ini		pytest.ini
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Presidio - Data Protection and Anonymization API

What is Presidio

Demo

Overview

Presidio API

API Samples

Learn more

Deploying Presidio on a Kubernetes Cluster

Developing Presidio

Deploy Presidio for Test and Dev

Current input/output components status

How to contact us?

Contributing

About

Releases

Packages

Contributors 6

Languages

License

pvcy/presidio

Folders and files

Latest commit

History

Repository files navigation

Presidio - Data Protection and Anonymization API

What is Presidio

Demo

Overview

Presidio API

API Samples

Learn more

Deploying Presidio on a Kubernetes Cluster

Developing Presidio

Deploy Presidio for Test and Dev

Current input/output components status

How to contact us?

Contributing

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages