Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding User Behavior Insights functionality. #13546

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- [Search Pipeline] Handle default pipeline for multiple indices ([#13276](https://github.com/opensearch-project/OpenSearch/pull/13276))
- Add support for deep copying SearchRequest ([#12295](https://github.com/opensearch-project/OpenSearch/pull/12295))
- Support multi ranges traversal when doing date histogram rewrite optimization. ([#13317](https://github.com/opensearch-project/OpenSearch/pull/13317))
- Add User Behavior Insights. ([#13545](https://github.com/opensearch-project/OpenSearch/issues/13545))

### Dependencies
- Bump `org.apache.commons:commons-configuration2` from 2.10.0 to 2.10.1 ([#12896](https://github.com/opensearch-project/OpenSearch/pull/12896))
Expand Down
1 change: 1 addition & 0 deletions gradle/missing-javadoc.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ configure([
project(":modules:repository-url"),
project(":modules:systemd"),
project(":modules:transport-netty4"),
project(":modules:ubi"),
project(":plugins:analysis-icu"),
project(":plugins:analysis-kuromoji"),
project(":plugins:analysis-nori"),
Expand Down
59 changes: 59 additions & 0 deletions modules/ubi/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# User Behavior Insights (UBI)

UBI facilitates storing queries and events for the purposes of improving search relevance as descrbed by [[RFC] User Behavior Insights](https://github.com/opensearch-project/OpenSearch/issues/12084).

## Indexing Queries

For UBI to index a query, add a `ubi` block to the `ext` in the search request containing a `query_id`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like if you do not have a query_id, one will be provided for you.

Is the presence of an empty ubi block sufficient to get the logging?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it generate a query_id if none is provided but I should not have yet. In this first version, a query_id is required. This is because the search response is not yet being modified. In a later revision, query_id will be optional and generated if not provided and returned in the search response's ext. I will remove that code in getQueryId() to make a random UUID if it's null.

An empty block was sufficient, but I will change it to require that the ubi block contains a query_id. If no ubi block, or an empty ubi block, the rest of the code in the UbiActionFilter will be skipped.


```
curl -s http://localhost:9200/ecommerce/_search -H "Content-type: application/json" -d'
{
"query": {
"match": {
"title": "toner OR ink"
}
},
"ext": {
"ubi": {
"query_id": "1234512345"
}
}
}
```

There are optional values that can be included in the `ubi` block along with the `query_id`. Those values are:
* `client_id` - A unique identifier for the source of the query. This may represent a user or some other mechanism.
* `user_query` - The user-entered query for this search. For example, in the search request above, the `user_query` may have been `toner ink`.

With these optional values, a sample query would look like:

```
curl -s http://localhost:9200/ecommerce/_search -H "Content-type: application/json" -d'
{
"query": {
"match": {
"title": "toner OR ink"
}
},
"ext": {
"ubi": {
"query_id": "1234512345",
"client_id": "abcdefg",
"user_query": "toner ink"
}
}
}
```

If a search request does not contain a `ubi` block in `ext`, the query will *not* be indexed.

Queries are indexed into an index called `ubi_queries`.

## Indexing Events

UBI facilitates indexing both queries and client-side events. These client-side events may be product clicks, scroll-depth,
adding a product to a cart, or other actions. UBI indexes these events in an index called `ubi_events`. This index is
automatically created the first time a query containing a `ubi` section in `ext` (example above).

Client-side events can be indexed into the `ubi_events` index by your method of choice.
12 changes: 12 additions & 0 deletions modules/ubi/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apply plugin: 'opensearch.yaml-rest-test'

opensearchplugin {
description 'Integrates OpenSearch with systemd'
classname 'org.opensearch.ubi.UbiModulePlugin'
}

dependencies {
// required for the yaml test to run
yamlRestTestImplementation "org.apache.logging.log4j:log4j-core:${versions.log4j}"
runtimeOnly "org.apache.logging.log4j:log4j-core:${versions.log4j}"
}
77 changes: 77 additions & 0 deletions modules/ubi/src/main/java/org/opensearch/ubi/QueryRequest.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
/*
* SPDX-License-Identifier: Apache-2.0
*
* The OpenSearch Contributors require contributions made to
* this file be licensed under the Apache-2.0 license or a
* compatible open source license.
*/

package org.opensearch.ubi;

/**
* A query received by OpenSearch.
*/
public class QueryRequest {

private final long timestamp;
private final String queryId;
private final String userId;
private final String userQuery;
private final QueryResponse queryResponse;

/**
* Creates a query request.
* @param queryId The ID of the query.
* @param userQuery The user-entered query.
* @param userId The ID of the user that initiated the query.
* @param queryResponse The {@link QueryResponse} for this query request.
*/
public QueryRequest(final String queryId, final String userQuery, final String userId, final QueryResponse queryResponse) {
this.timestamp = System.currentTimeMillis();
this.queryId = queryId;
this.userId = userId;
this.userQuery = userQuery;
this.queryResponse = queryResponse;
}

/**
* Gets the timestamp.
* @return The timestamp.
*/
public long getTimestamp() {
return timestamp;
}

/**
* Gets the query ID.
* @return The query ID.
*/
public String getQueryId() {
return queryId;
}

/**
* Gets the user query.
* @return The user query.
*/
public String getUserQuery() {
return userQuery;
}

/**
* Gets the user ID.
* @return The user ID.
*/
public String getUserId() {
return userId;
}

/**
* Gets the query response for this query request.
* @return The {@link QueryResponse} for this query request.
*/
public QueryResponse getQueryResponse() {
return queryResponse;
}

}
58 changes: 58 additions & 0 deletions modules/ubi/src/main/java/org/opensearch/ubi/QueryResponse.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
/*
* SPDX-License-Identifier: Apache-2.0
*
* The OpenSearch Contributors require contributions made to
* this file be licensed under the Apache-2.0 license or a
* compatible open source license.
*/

package org.opensearch.ubi;

import java.util.List;

/**
* A query response.
*/
public class QueryResponse {

private final String queryId;
private final String queryResponseId;
private final List<String> queryResponseObjectIds;

/**
* Creates a query response.
* @param queryId The ID of the query.
* @param queryResponseId The ID of the query response.
* @param queryResponseObjectIds A list of IDs for the hits in the query.
*/
public QueryResponse(final String queryId, final String queryResponseId, final List<String> queryResponseObjectIds) {
this.queryId = queryId;
this.queryResponseId = queryResponseId;
this.queryResponseObjectIds = queryResponseObjectIds;
}

/**
* Gets the query ID.
* @return The query ID.
*/
public String getQueryId() {
return queryId;
}

/**
* Gets the query response ID.
* @return The query response ID.
*/
public String getQueryResponseId() {
return queryResponseId;
}

/**
* Gets the list of query response hit IDs.
* @return A list of query response hit IDs.
*/
public List<String> getQueryResponseObjectIds() {
return queryResponseObjectIds;
}

}
Loading
Loading