Context aware, pluggable and customizable data protection and PII anonymization service for text and images
Presidio (Origin from Latin praesidium ‘protection, garrison’) helps to ensure sensitive text is properly managed and governed. It provides fast analytics and anonymization for sensitive text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers and financial data. Presidio analyzes the text using predefined or custom recognizers to identify entities, patterns, formats, and checksums with relevant context. Presidio leverages docker and kubernetes for workloads at scale.
Presidio can be integrated into any data pipeline for intelligent PII scrubbing. It is open-source, transparent and scalable. Additionally, PII anonymization use-cases often require a different set of PII entities to be detected, some of which are domain or business specific. Presidio allows you to customize or add new PII recognizers via API or code to best fit your anonymization needs.
Try Presidio with your own data
Unstructured text anonymization
Presidio automatically detects Personal-Identifiable Information (PII) in unstructured text, annonymizes it based on one or more anonymization mechanisms, and returns a string with no personal identifiable data. For example:
For each PII entity, presidio returns a confidence score:
Text anonymization in images (beta)
Presidio uses OCR to detect text in images. It further allows the redaction of the text from the original image.
Presidio accepts multiple sources and targets for data annonymization. Specifically:
-
Storage solutions
- Azure Blob Storage
- Azure Data Lake Gen 2
- S3
- Google Cloud Storage
-
Databases
- MySQL
- PostgreSQL
- Sql Server
- Oracle
-
Streaming platforms
- Kafka
- Azure Events Hubs
-
REST requests
It then can export the results to file storage, databases or streaming platforms.
Presidio leverages:
The design document introduces Presidio's concepts and architecture.