forked from opensearch-project/documentation-website
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
document the new
analysis-phonenumber
plugin
this is part of opensearch-project/OpenSearch#11326. the actual implementation was done opensearch-project/OpenSearch#15915. see the commit message on the PR for further details. resolves opensearch-project#8389 Co-authored-by: Fanit Kolchina <[email protected]> Signed-off-by: Fanit Kolchina <[email protected]> Signed-off-by: Ralph Ursprung <[email protected]>
- Loading branch information
1 parent
cd31d82
commit d81da26
Showing
3 changed files
with
155 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
--- | ||
layout: default | ||
title: Phone number | ||
parent: Analyzers | ||
nav_order: 140 | ||
--- | ||
|
||
# Phone number analyzers | ||
|
||
The `analysis-phonenumber` plugin provides analyzers and tokenizers for parsing phone numbers. | ||
A dedicated analyzer is required because parsing phone numbers is a non-trivial task (even though it might seem trivial at first glance). For common pitfalls in parsing phone numbers, see [Falsehoods programmers believe about phone numbers](https://github.com/google/libphonenumber/blob/master/FALSEHOODS.md). | ||
|
||
|
||
OpenSearch supports the following phone number analyzers: | ||
|
||
* `phone`: An [index analyzer]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) to use at indexing time. | ||
* `phone-search`: A [search analyzer]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/) to use at search time. | ||
|
||
Internally, the plugin uses the [`libphonenumber`](https://github.com/google/libphonenumber) library and follows its parsing rules. | ||
|
||
The phone number analyzers are not meant to find phone numbers in larger texts. Instead, you should use them on fields which contain phone numbers alone. | ||
{: .note} | ||
|
||
## Installing the plugin | ||
|
||
Before you can use phone number analyzers, you must install the `analysis-phonenumber` plugin by running the following command: | ||
|
||
```sh | ||
./bin/opensearch-plugin install analysis-phonenumber | ||
``` | ||
|
||
## Specifying a default region | ||
|
||
You can optionally specify a default region for parsing phone numbers by providing the `phone-region` parameter within the analyzer. Valid phone regions are ISO 3166 country codes. For more information, see [List of ISO 3166 country codes](https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes). | ||
|
||
When tokenizing phone numbers containing the international calling prefix `+`, the default region is irrelevant. However, for phone numbers which either use a national prefix for international numbers (for example, `001` instead of `+1` to dial Northern America from most European countries), the region needs to be provided. You can also properly index local phone numbers with no international prefix if you specify the region. | ||
|
||
## Example | ||
|
||
The following request creates an index containing one field, which ingests phone numbers for Switzerland (region code `CH`): | ||
|
||
```json | ||
PUT /example-phone | ||
{ | ||
"settings": { | ||
"analysis": { | ||
"analyzer": { | ||
"phone-ch": { | ||
"type": "phone", | ||
"phone-region": "CH" | ||
}, | ||
"phone-search-ch": { | ||
"type": "phone-search", | ||
"phone-region": "CH" | ||
} | ||
} | ||
} | ||
}, | ||
"mappings": { | ||
"properties": { | ||
"phoneNumber": { | ||
"type": "text", | ||
"analyzer": "phone-ch", | ||
"search_analyzer": "phone-search-ch" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
Analysing a (fictional) Swiss phone number with an international calling prefix will work the same with either the Swiss-specific phone region or without: | ||
```json | ||
GET /example-phone/_analyze | ||
{ | ||
"analyzer" : "phone-ch", | ||
"text" : "+41 60 555 12 34" | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
and | ||
|
||
```json | ||
GET /example-phone/_analyze | ||
{ | ||
"analyzer" : "phone", | ||
"text" : "+41 60 555 12 34" | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
will produce the same result: | ||
```json | ||
["+41 60 555 12 34", "6055512", "41605551", "416055512", "6055", "41605551234", ...] | ||
``` | ||
|
||
If, however, the phone number is given without the international calling prefix `+` (either by using `0041` or omitting | ||
the international calling prefix altogether) then only the analyzer with the correct phone region will be able to parse it: | ||
```json | ||
GET /example-phone/_analyze | ||
{ | ||
"analyzer" : "phone-ch", | ||
"text" : "060 555 12 34" | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
In contrast the `phone-search` analyzer does not create n-grams and only issues some basic tokens: | ||
```json | ||
GET /example-phone/_analyze | ||
{ | ||
"analyzer" : "phone-search", | ||
"text" : "+41 60 555 12 34" | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
```json | ||
["+41 60 555 12 34", "41 60 555 12 34", "41605551234", "605551234", "41"] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters