Skip to content

Commit

Permalink
document the new analysis-phonenumber plugin
Browse files Browse the repository at this point in the history
this is part of opensearch-project/OpenSearch#11326. the actual
implementation was done opensearch-project/OpenSearch#15915. see the
commit message on the PR for further details.

resolves opensearch-project#8389

Co-authored-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Ralph Ursprung <[email protected]>
  • Loading branch information
rursprung and kolchfa-aws committed Oct 11, 2024
1 parent cd31d82 commit d81da26
Show file tree
Hide file tree
Showing 3 changed files with 155 additions and 24 deletions.
11 changes: 10 additions & 1 deletion _analyzers/supported-analyzers/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,13 @@ Analyzer | Analysis performed | Analyzer output

## Language analyzers

OpenSearch supports analyzers for various languages. For more information, see [Language analyzers]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/).
OpenSearch supports analyzers for various languages. For more information, see [Language analyzers]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/).

## Additional analyzers

The following table lists the additional analyzers that OpenSearch supports.

| Analyzer | Analysis performed |
|:---------------|:---------------------------------------------------------------------------------------------------------|
| `phone` | An [index analyzer]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) for parsing phone numbers. |
| `phone-search` | A [search analyzer]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/) for parsing phone numbers. |
121 changes: 121 additions & 0 deletions _analyzers/supported-analyzers/phone-analyzers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
---
layout: default
title: Phone number
parent: Analyzers
nav_order: 140
---

# Phone number analyzers

The `analysis-phonenumber` plugin provides analyzers and tokenizers for parsing phone numbers.
A dedicated analyzer is required because parsing phone numbers is a non-trivial task (even though it might seem trivial at first glance). For common pitfalls in parsing phone numbers, see [Falsehoods programmers believe about phone numbers](https://github.com/google/libphonenumber/blob/master/FALSEHOODS.md).


OpenSearch supports the following phone number analyzers:

* `phone`: An [index analyzer]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) to use at indexing time.
* `phone-search`: A [search analyzer]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/) to use at search time.

Internally, the plugin uses the [`libphonenumber`](https://github.com/google/libphonenumber) library and follows its parsing rules.

The phone number analyzers are not meant to find phone numbers in larger texts. Instead, you should use them on fields which contain phone numbers alone.
{: .note}

## Installing the plugin

Before you can use phone number analyzers, you must install the `analysis-phonenumber` plugin by running the following command:

```sh
./bin/opensearch-plugin install analysis-phonenumber
```

## Specifying a default region

You can optionally specify a default region for parsing phone numbers by providing the `phone-region` parameter within the analyzer. Valid phone regions are ISO 3166 country codes. For more information, see [List of ISO 3166 country codes](https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes).

When tokenizing phone numbers containing the international calling prefix `+`, the default region is irrelevant. However, for phone numbers which either use a national prefix for international numbers (for example, `001` instead of `+1` to dial Northern America from most European countries), the region needs to be provided. You can also properly index local phone numbers with no international prefix if you specify the region.

## Example

The following request creates an index containing one field, which ingests phone numbers for Switzerland (region code `CH`):

```json
PUT /example-phone
{
"settings": {
"analysis": {
"analyzer": {
"phone-ch": {
"type": "phone",
"phone-region": "CH"
},
"phone-search-ch": {
"type": "phone-search",
"phone-region": "CH"
}
}
}
},
"mappings": {
"properties": {
"phoneNumber": {
"type": "text",
"analyzer": "phone-ch",
"search_analyzer": "phone-search-ch"
}
}
}
}
```
{% include copy-curl.html %}

Analysing a (fictional) Swiss phone number with an international calling prefix will work the same with either the Swiss-specific phone region or without:
```json
GET /example-phone/_analyze
{
"analyzer" : "phone-ch",
"text" : "+41 60 555 12 34"
}
```
{% include copy-curl.html %}

and

```json
GET /example-phone/_analyze
{
"analyzer" : "phone",
"text" : "+41 60 555 12 34"
}
```
{% include copy-curl.html %}

will produce the same result:
```json
["+41 60 555 12 34", "6055512", "41605551", "416055512", "6055", "41605551234", ...]
```

If, however, the phone number is given without the international calling prefix `+` (either by using `0041` or omitting
the international calling prefix altogether) then only the analyzer with the correct phone region will be able to parse it:
```json
GET /example-phone/_analyze
{
"analyzer" : "phone-ch",
"text" : "060 555 12 34"
}
```
{% include copy-curl.html %}

In contrast the `phone-search` analyzer does not create n-grams and only issues some basic tokens:
```json
GET /example-phone/_analyze
{
"analyzer" : "phone-search",
"text" : "+41 60 555 12 34"
}
```
{% include copy-curl.html %}

```json
["+41 60 555 12 34", "41 60 555 12 34", "41605551234", "605551234", "41"]
```
47 changes: 24 additions & 23 deletions _install-and-configure/additional-plugins/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,29 +9,30 @@ nav_order: 10

There are many more plugins available in addition to those provided by the standard distribution of OpenSearch. These additional plugins have been built by OpenSearch developers or members of the OpenSearch community. While it isn't possible to provide an exhaustive list (because many plugins are not maintained in an OpenSearch GitHub repository), the following plugins, available in the [OpenSearch/plugins](https://github.com/opensearch-project/OpenSearch/tree/main/plugins) directory on GitHub, are some of the plugins that can be installed using one of the installation options, for example, using the command `bin/opensearch-plugin install <plugin-name>`.

| Plugin name | Earliest available version |
| :--- | :--- |
| analysis-icu | 1.0.0 |
| analysis-kuromoji | 1.0.0 |
| analysis-nori | 1.0.0 |
| analysis-phonetic | 1.0.0 |
| analysis-smartcn | 1.0.0 |
| analysis-stempel | 1.0.0 |
| analysis-ukrainian | 1.0.0 |
| discovery-azure-classic | 1.0.0 |
| discovery-ec2 | 1.0.0 |
| discovery-gce | 1.0.0 |
| [`ingest-attachment`]({{site.url}}{{site.baseurl}}/install-and-configure/additional-plugins/ingest-attachment-plugin/) | 1.0.0 |
| mapper-annotated-text | 1.0.0 |
| mapper-murmur3 | 1.0.0 |
| [`mapper-size`]({{site.url}}{{site.baseurl}}/install-and-configure/additional-plugins/mapper-size-plugin/) | 1.0.0 |
| query-insights | 2.12.0 |
| repository-azure | 1.0.0 |
| repository-gcs | 1.0.0 |
| repository-hdfs | 1.0.0 |
| repository-s3 | 1.0.0 |
| store-smb | 1.0.0 |
| transport-nio | 1.0.0 |
| Plugin name | Earliest available version |
|:-----------------------------------------------------------------------------------------------------------------------|:---------------------------|
| analysis-icu | 1.0.0 |
| analysis-kuromoji | 1.0.0 |
| analysis-nori | 1.0.0 |
| [`analysis-phonenumber`]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/phone-analyzers/) | 2.18.0 |
| analysis-phonetic | 1.0.0 |
| analysis-smartcn | 1.0.0 |
| analysis-stempel | 1.0.0 |
| analysis-ukrainian | 1.0.0 |
| discovery-azure-classic | 1.0.0 |
| discovery-ec2 | 1.0.0 |
| discovery-gce | 1.0.0 |
| [`ingest-attachment`]({{site.url}}{{site.baseurl}}/install-and-configure/additional-plugins/ingest-attachment-plugin/) | 1.0.0 |
| mapper-annotated-text | 1.0.0 |
| mapper-murmur3 | 1.0.0 |
| [`mapper-size`]({{site.url}}{{site.baseurl}}/install-and-configure/additional-plugins/mapper-size-plugin/) | 1.0.0 |
| query-insights | 2.12.0 |
| repository-azure | 1.0.0 |
| repository-gcs | 1.0.0 |
| repository-hdfs | 1.0.0 |
| repository-s3 | 1.0.0 |
| store-smb | 1.0.0 |
| transport-nio | 1.0.0 |

## Related articles

Expand Down

0 comments on commit d81da26

Please sign in to comment.