Skip to content

Commit

Permalink
add presidio
Browse files Browse the repository at this point in the history
  • Loading branch information
baniasbaabe committed Oct 20, 2024
1 parent aacc305 commit 9e32a28
Showing 1 changed file with 61 additions and 0 deletions.
61 changes: 61 additions & 0 deletions book/cooltools/Chapter.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2207,6 +2207,67 @@
"def transform(prompt: str, history: list[mel.ChatMessage]) -> str:\n",
" return \"Hello \" + prompt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Anonymize PII Data with `presidio`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Working with PII data can be a neckbreaker in some cases.\n",
"\n",
"Luckily, for fast anonymization, you can use presidio.\n",
"\n",
"presidio handles anonymization of popular entities like names, phone numbers, credit card numbers or Bitcoin wallets.\n",
"\n",
"It can even handle text in images!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install presidio_analyzer presidio_anonymizer"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!python -m spacy download en_core_web_lg"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from presidio_analyzer import AnalyzerEngine\n",
"from presidio_anonymizer import AnonymizerEngine\n",
"\n",
"text_to_anonymize = \"His name is Mr. Jones. His phone number is 212-555-5555.\"\n",
"\n",
"analyzer = AnalyzerEngine()\n",
"results = analyzer.analyze(text=text_to_anonymize, entities=[\"PHONE_NUMBER\", \"PERSON\"], language='en')\n",
"\n",
"anonymizer = AnonymizerEngine()\n",
"\n",
"anonymized_text = anonymizer.anonymize(text=text_to_anonymize, analyzer_results=results)\n",
"\n",
"print(anonymized_text)\n",
"\n",
"# Output: His name is Mr. <PERSON>. His phone number is <PHONE_NUMBER>."
]
}
],
"metadata": {
Expand Down

0 comments on commit 9e32a28

Please sign in to comment.