Presidio - Data Protection API

Context aware, pluggable and customizable data protection and PII anonymization service for text and images

What is Presidio

Presidio (Origin from Latin praesidium ‘protection, garrison’) helps to ensure sensitive text is properly managed and governed. It provides fast analytics and anonymization for sensitive text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers and financial data. Presidio analyzes the text using predefined or custom recognizers to identify entities, patterns, formats, and checksums with relevant context. Presidio leverages docker and kubernetes for workloads at scale.

Presidio can be integrated into any data pipeline for intelligent PII scrubbing. It is open-source, transparent and scalable. Additionally, PII anonymization use-cases often require a different set of PII entities to be detected, some of which are domain or business specific. Presidio allows you to customize or add new PII recognizers via API or code to best fit your anonymization needs.

⚠️ Presidio can help identify sensitive/PII data in un/structured text. However, because Presidio is using trained ML models, there is no guarantee that Presidio will find all sensitive information. Consequently, additional systems and protections should be employed.

Demo

Try Presidio with your own data

Features

Unstructured text anonymization

Presidio automatically detects Personal-Identifiable Information (PII) in unstructured text, annonymizes it based on one or more anonymization mechanisms, and returns a string with no personal identifiable data. For example:

For each PII entity, presidio returns a confidence score:

Text anonymization in images (beta)

Presidio uses OCR to detect text in images. It further allows the redaction of the text from the original image.

Input and output

Presidio accepts multiple sources and targets for data annonymization. Specifically:

Storage solutions
- Azure Blob Storage
- Azure Data Lake Gen 2
- S3
- Google Cloud Storage
Databases
- MySQL
- PostgreSQL
- Sql Server
- Oracle
Streaming platforms
- Kafka
- Azure Events Hubs
REST requests

It then can export the results to file storage, databases or streaming platforms.

The Technology Stack

Presidio leverages:

Kubernetes
spaCy
Redis
GRPC
re2

The design document introduces Presidio's concepts and architecture.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

overview.md

overview.md

Presidio - Data Protection API

What is Presidio

Demo

Features

Input and output

The Technology Stack

Files

overview.md

Latest commit

History

overview.md

File metadata and controls

Presidio - Data Protection API

What is Presidio

Demo

Features

Input and output

The Technology Stack