copyright | lastupdated | keywords | subcollection | ||
---|---|---|---|---|---|
|
2024-08-28 |
Watson NLP, entities, keywords, pos, part of speech, sentiment |
discovery-data |
{{site.data.keyword.attribute-definition-list}}
{: #nlu}
Take advantage of award-winning Watson Natural Language Processing (NLP) capabilities by adding prebuilt enrichments to your documents. {: shortdesc}
With Watson NLP, you can identify and tag meaningful information in your collections so you can understand what it all means and make more informed decisions.
The following Watson NLP enrichments are available:
- Entities: Recognizes proper nouns such as people, cities, and organizations that are mentioned in the content.
- Keywords: Recognizes significant terms in your content.
- Part of Speech: Identifies the parts of speech (nouns and verbs, for example) in the content.
- Sentiment: Understands the overall sentiment of the content.
The following other pretrained enrichments are available with {{site.data.keyword.discoveryshort}}:
{: #nlu-overview}
For example, the following screen capture shows a transcript of the US Declaration of Independence that was added to a {{site.data.keyword.discoveryshort}} collection where the Entities and Keywords enrichments are enabled. The mentions that are recognized by the enrichments are highlighted in the document text.
{: caption="Excerpt of the US Declaration of Independence with highlighted terms" caption-side="bottom"}
Some of the NLP enrichments are applied to projects automatically. You don't need to apply them yourself if you are using one of these project types.
{{site.data.content.enrichment-defaults-reuse}}
For more information about the following prebuilt enrichments, see the following topics:
For more information about how to create custom enrichments, see Adding domain-specific resources.
For more information about how to get the most from enrichments, read the Enriching your documents can make search more effective{: external} blog post.
For more information about how to apply enrichments by using the API, see Applying enrichments by using the API.
{: #nlu-task}
To add an NLP enrichment, complete the following steps:
-
Open your project and go to the Manage collections page.
-
Click to open the collection that you want to enrich.
-
Open the Enrichments tab.
-
Scroll to find the NLP enrichment that you want to apply to your documents.
Both built-in enrichments and custom enrichments are listed. Built-in enrichments have a type value of
System
. {: note} -
Choose one or more fields to apply the enrichment to.
You can apply enrichments to the
text
andhtml
fields, and to custom fields that were added from uploaded JSON or CSV files or from the Smart Document Understanding (SDU) tool. -
Click Apply changes and reprocess.
Enrichments that you enable are applied to the documents in random order. For information about how to remove an enrichment, see Managing enrichments.
{: #nlu-entities}
Identifies entities. Entities are terms that typically represent proper nouns such as people, cities, and organizations that are mentioned in the data collection. {{site.data.keyword.discoveryshort}} can recognize entities that are part of an entity type system that is defined by the Watson Natural Language Processing (NLP) service.
If you want to be able to identify uncommon terms that are significant to your business, you can train your own model to recognize custom entities. For more information, see Entity extractor.
The Watson NLP entity extractor service that is used by Discovery is called the NLU type system. The name originates from the fact that the type system is used by the Watson Natural Language Understanding (NLU) service in addition to the Watson Discovery service. However, it is the Watson NLP implementation of the type system that is used directly by Discovery, not the Watson NLU implementation. As a result, the two implementations can produce different results. To get a general idea of the types of entities that are recognized by the service, see Entities{: external}.
The following screen capture shows that the Entities enrichment recognizes the terms Systems of Government and King of Great Britain (among others) and tags them as entity mentions.
{: caption="The recognized entities, Governments and King of Great Britain, are highlighted" caption-side="bottom"}
From the JSON view of the document, you can see the underlying JSON structure of the entity mentions.
{: caption="JSON representation of recognized entity mentions" caption-side="bottom"}
If you want to search for the Organization entity type, for example, you can copy all of the JSON content into a text editor and search for Organization
. Click the Copy icon from the root of the JSON tree view.
{: #nlu-entities-example}
{: #nlu-entities-example-input}
"IBM is an American multinational technology company headquartered in Armonk."
{: codeblock}
{: #nlu-entities-example-response}
In the JSON output:
text
= string. The entity texttype
= string. The entity type, such asOrganization
,Location
,Person
,Number
.mentions
= array. The entity mentions and locationsmodel_name
= string. For custom models, this field contains the user-provided model name. Otherwise, this field contains the default name of the model, such aswatson_knowledge_studio
,dictionary
,character_pattern
, ornatural_language_understanding
{
"entities": [
{
"model_name": "natural_language_understanding",
"mentions": [
{
"confidence": 0.8317045,
"location": {
"end": 3,
"begin": 0
},
"text": "IBM"
}
],
"text": "IBM",
"type": "Organization"
},
{
"model_name": "natural_language_understanding",
"mentions": [
{
"confidence": 0.6114863,
"location": {
"end": 75,
"begin": 69
},
"text": "Armonk"
}
],
"text": "Armonk",
"type": "Location"
}
]
}
{: codeblock}
{: #nlu-keywords}
Returns important keywords in the content.
For example, the following screen capture shows highlighted terms from the US Declaration of Independence that are recognized by the Keywords enrichment.
{: caption="Terms recognized by the Keywords enrichment" caption-side="bottom"}
From the JSON view of the document, you can see the underlying JSON structure of the Declaration
keyword mention.
{: caption="JSON representation of Keywords enrichment mentions" caption-side="bottom"}
{: #nlu-keywords-example}
{: #nlu-keywords-example-input}
"Watson Discovery is an award-winning AI search technology."
{: codeblock}
{: #nlu-keywords-example-response}
In the JSON output:
text
= The keyword textmentions
= The entity mentions and locations
{
"keywords": [
{
"mentions": [
{
"location": {
"end": 157,
"begin": 141
},
"text": "Watson Discovery"
}
],
"text": "Watson Discovery",
"relevance": 0.503613
},
{
"mentions": [
{
"location": {
"end": 177,
"begin": 164
},
"text": "award-winning"
}
],
"text": "award-winning",
"relevance": 0.728722
},
{
"mentions": [
{
"location": {
"end": 198,
"begin": 181
},
"text": "search technology"
}
],
"text": "search technology",
"relevance": 0.779356
}
]
}
{: codeblock}
{: #nlu-keywords-limits}
The Keywords enrichment can identify up to 50 keywords, each with one or many mentions, per document.
{: #nlu-pos}
Recognizes and tags parts of speech, including nouns, verbs, adjectives, adverbs, conjunctions, interjections, and numerals.