Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrections to bulk.md #620

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
### Deprecated
### Removed
### Fixed
- Corrected documentation for `bulk.md` ([#620](https://github.com/opensearch-project/opensearch-js/pull/620))
### Security

## [2.3.0]
Expand Down
70 changes: 47 additions & 23 deletions guides/bulk.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,18 +52,25 @@ client.bulk({
```
As you can see, each bulk operation is comprised of two objects. The first object contains the operation type and the target document's `_index` and `_id`. The second object contains the document's data. As a result, the body of the request above contains six objects for three index actions.

Alternatively, the `bulk` method can accept an array of hashes where each hash represents a single operation. The following code is equivalent to the previous example:
Alternatively, the `bulk` method can accept an array of objects where objects are in pairs, except for delete which is singular. In each pair, the first item contains the action like `create` and, `_index` and `id` specify which index and document id the action has to be performed on. The second item in the pair is the document itself.

```javascript
client.bulk({
body: [
{ index: { _index: movies, _id: 1, data: { title: 'Beauty and the Beast', year: 1991 } } },
{ index: { _index: movies, _id: 2, data: { title: 'Beauty and the Beast - Live Action', year: 2017 } } },
{ index: { _index: books, _id: 1, data: { title: 'The Lion King', year: 1994 } } }
]
}).then((response) => {
console.log(response);
});
index: movies,
body: [
{ create: { _id: 1} },
{ title: 'Beauty and the Beast 1', year: 2050 },
{ delete: { _id: 1 } },
{ create: { _id: 2 } },
{ title: 'Beauty and the Beast 2', year: 2051 },
{ create: {} },
{ title: 'Beauty and the Beast 2', year: 2051 },
{ create: { _index: books } },
{ title: '2012', year: 2012 },
]
}).then((response) => {
console.log(response.body.items);
});
```

We will use this format for the rest of the examples in this guide.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this part is not applicable to JavaScript. There's only way to format the body for client.bulk in JS unlike in Ruby.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in we can remove line 55 to 76

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look like we have client.helpers.bulk that allows another format if you wanna look into that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in we can remove line 55 to 76

I don't follow exactly. Is that not how client.bulk works? Because I have used that same method across the document.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should write a working sample, and include it along with these changes so we can be sure @AbhinavGarg90.

Copy link
Author

@AbhinavGarg90 AbhinavGarg90 Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const { Client } = require('@opensearch-project/opensearch')
const documents = require('./docs.json')

const client = new Client({ ... })

const result = await client.helpers.bulk({
  datasource: documents,
  onDocument (doc) {
    return {
      index: { _index: 'example-index' }
    }
  }
})

console.log(result)

Looking at the documentation, it doesn't specify the shape of the input document. I assume it has key and value, but I cannot seem to locate any documentation with further detail about this. Would you be able to help me with some details of how this is implemented? I can then go about adding this to the PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean https://opensearch.org/docs/latest/api-reference/document-apis/bulk/? It's pretty complete (bulk API accepts line-delimited JSON).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean https://opensearch.org/docs/latest/api-reference/document-apis/bulk/? It's pretty complete (bulk API accepts line-delimited JSON).

Thank you for pointing out where this is. Should I link this in bulk.md as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does line-delimited JSON work? I tried running a sample using the available documentation, it doesn't seem to be working.

Here are the code snippets I am working with:

const { Client } = require('@opensearch-project/opensearch');
const documents = require('./docs.json')

try {
  const client = new Client({
    node: 'https://admin:admin@localhost:9200',
    ssl: { rejectUnauthorized: false }
  });
  client.helpers.bulk({
    datasource: documents,
    onDocument (doc) {
      return {
        index: { _index: 'example-index' }
      }
    }
  }).then((response) => {
    console.log(response)
  }).catch((error) => {
    console.error(error);
  })
} catch(e) {
  console.log("error", e)
}

docs.json

{ "delete": { "_index": "movies", "_id": "tt2229499" } }
{ "index": { "_index": "movies", "_id": "tt1979320" } }
{ "title": "Rush", "year": 2013 }
{ "create": { "_index": "movies", "_id": "tt1392214" } }
{ "title": "Prisoners", "year": 2013 }
{ "update": { "_index": "movies", "_id": "tt0816711" } }
{ "doc" : { "title": "World War Z" } }

error:

Unexpected non-whitespace character after JSON at position 57

Copy link
Collaborator

@nhtruong nhtruong Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AbhinavGarg90 Sorry for the very late response. I've been out of commission for the last few weeks.

The datasource is an iterable and the simplest iterable in this case is an array of objects, where each object is an OpenSearch doc:

  const docs = [{ title: 'Beauty and the Beast 1', year: 2050 },
                { title: 'Beauty and the Beast 2', year: 2051 }]

Then you can pass it to the helper function:

  let result = await client.helpers.bulk({ datasource: docs,
    onDocument(doc) {
      return { index: { _index: movies} }
    }
  });

  console.log(result);

And you should get this in the console:

{
  total: 2,
  failed: 0,
  retry: 0,
  successful: 2,
  noop: 0,
  time: 744,
  bytes: 150,
  aborted: false
}

Basically the client.helpers.bulk method, on top of implementing some batching logic, also acts as syntactical sugar for client.bulk. That is, the user only have to provide the documents in their original forms (an array of objects in the above example) and the action (index in this example) to apply to the documents.

Expand All @@ -76,10 +83,14 @@ Similarly, instead of calling the `create` method for each document, you can use
client.bulk({
index: movies,
body: [
{ create: { data: { title: 'Beauty and the Beast 2', year: 2030 } } },
{ create: { data: { title: 'Beauty and the Beast 3', year: 2031 } } },
{ create: { data: { title: 'Beauty and the Beast 4', year: 2049 } } },
{ create: { _index: books, data: { title: 'The Lion King 2', year: 1998 } } }
{ create: {} },
{ title: 'Beauty and the Beast 2', year: 2030 },
{ create: {} },
{ title: 'Beauty and the Beast 3', year: 2031 },
{ create: {} },
{ title: 'Beauty and the Beast 4', year: 2049 },
{ create: { _index: books } },
{ title: 'The Lion King 2', year: 1998 }
]
}).then((response) => {
console.log(response);
Expand All @@ -92,8 +103,10 @@ Note that we specified only the `_index` for the last document in the request bo
client.bulk({
index: movies,
body: [
{ update: { _id: 1, data: { doc: { year: 1992 } } } },
{ update: { _id: 2, data: { doc: { year: 2018 } } } }
{ update: { _id: 1 } },
{ doc: { year: 1992 } },
{ update: { _id: 2 } },
{ doc: { year: 2018 } }
]
}).then((response) => {
console.log(response);
Expand Down Expand Up @@ -122,9 +135,12 @@ You can mix and match the different operations in a single request. The followin
client.bulk({
index: movies,
body: [
{ create: { data: { title: 'Beauty and the Beast 5', year: 2050 } } },
{ create: { data: { title: 'Beauty and the Beast 6', year: 2051 } } },
{ update: { _id: 3, data: { doc: { year: 2052 } } } },
{ create: {} },
{ title: 'Beauty and the Beast 5', year: 2050 },
{ create: {} },
{ title: 'Beauty and the Beast 6', year: 2051 },
{ update: { _id: 3 } },
{ doc: { year: 2052 } },
{ delete: { _id: 4 } }
]
}).then((response) => {
Expand All @@ -141,10 +157,14 @@ The following code shows how to look for errors in the response:
client.bulk({
index: movies,
body: [
{ create: { _id: 1, data: { title: 'Beauty and the Beast', year: 1991 } } },
{ create: { _id: 2, data: { title: 'Beauty and the Beast 2', year: 2030 } } },
{ create: { _id: 1, data: { title: 'Beauty and the Beast 3', year: 2031 } } }, // document already exists error
{ create: { _id: 2, data: { title: 'Beauty and the Beast 4', year: 2049 } } } // document already exists error
{ create: { _id: 1 } },
{ title: 'Beauty and the Beast', year: 1991 },
{ create: { _id: 2} },
{ title: 'Beauty and the Beast 2', year: 2030 },
{ create: { _id: 1 } },
{ title: 'Beauty and the Beast 3', year: 2031 }, // document already exists error
{ create: { _id: 2 } },
{ title: 'Beauty and the Beast 4', year: 2049 } // document already exists error
]
}).then((response) => {
response.body.forEach((item) => {
Expand All @@ -162,5 +182,9 @@ client.bulk({
To clean up the resources created in this guide, delete the `movies` and `books` indices:

```javascript
client.indices.delete({ index: [movies, books] });
client.indices.delete({
index: [movies, books]
}).then((response) => {
console.log(response);
});
```