Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrections to bulk.md #620

Closed
wants to merge 5 commits into from

Conversation

AbhinavGarg90
Copy link

@AbhinavGarg90 AbhinavGarg90 commented Oct 3, 2023

Description

The description on how to use the API was incorrect. I had updated it with the correct implementation.

Issues Resolved

#616

Check List

  • New functionality includes testing.
    • All tests pass
  • Linter check was successfull - yarn run lint doesn't show any errors
  • Commits are signed per the DCO using --signoff
  • Changelog was updated.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think you could add a working sample into samples like https://github.com/opensearch-project/opensearch-py/tree/main/samples as part of this PR?

CHANGELOG.md Outdated Show resolved Hide resolved
@nhtruong
Copy link
Collaborator

nhtruong commented Oct 3, 2023

Do you think you could add a working sample into samples like https://github.com/opensearch-project/opensearch-py/tree/main/samples as part of this PR?

That's entirely different from guides. The equivalent on Python client is this one https://github.com/opensearch-project/opensearch-py/blob/main/guides/bulk.md

@AbhinavGarg90
Copy link
Author

Do you think you could add a working sample into samples like https://github.com/opensearch-project/opensearch-py/tree/main/samples as part of this PR?

That's entirely different from guides. The equivalent on Python client is this one https://github.com/opensearch-project/opensearch-py/blob/main/guides/bulk.md

Should I try to update that bulk.md file to be closer in line to the ruby version ?

fixed consistency issue in bulk.md

Signed-off-by: Abhinav Garg <[email protected]>
@nhtruong
Copy link
Collaborator

nhtruong commented Oct 3, 2023

Do you think you could add a working sample into samples like https://github.com/opensearch-project/opensearch-py/tree/main/samples as part of this PR?

That's entirely different from guides. The equivalent on Python client is this one https://github.com/opensearch-project/opensearch-py/blob/main/guides/bulk.md

Should I try to update that bulk.md file to be closer in line to the ruby version ?

You're welcome to. Or you can also create an issue in the Python Repo.

guides/bulk.md Outdated
@@ -52,18 +52,25 @@ client.bulk({
```
As you can see, each bulk operation is comprised of two objects. The first object contains the operation type and the target document's `_index` and `_id`. The second object contains the document's data. As a result, the body of the request above contains six objects for three index actions.

Alternatively, the `bulk` method can accept an array of hashes where each hash represents a single operation. The following code is equivalent to the previous example:
Alternatively, the `bulk` method can accept an array of hashes where hashes are in pairs, except for delete which is singular. In each pair, the first item contains the action like `create` and, `_index` and `id` specify which index and document id the action has to be performed on. The second item in the pair is the document itself.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hash, a very common data structure in Ruby, is actually referred to as an Object (as in JSON Object) in JavaScript.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will correct that now, I was apprehensive of it when I saw it earlier.

]
}).then((response) => {
console.log(response.body.items);
});
```

We will use this format for the rest of the examples in this guide.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this part is not applicable to JavaScript. There's only way to format the body for client.bulk in JS unlike in Ruby.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in we can remove line 55 to 76

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look like we have client.helpers.bulk that allows another format if you wanna look into that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in we can remove line 55 to 76

I don't follow exactly. Is that not how client.bulk works? Because I have used that same method across the document.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should write a working sample, and include it along with these changes so we can be sure @AbhinavGarg90.

Copy link
Author

@AbhinavGarg90 AbhinavGarg90 Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const { Client } = require('@opensearch-project/opensearch')
const documents = require('./docs.json')

const client = new Client({ ... })

const result = await client.helpers.bulk({
  datasource: documents,
  onDocument (doc) {
    return {
      index: { _index: 'example-index' }
    }
  }
})

console.log(result)

Looking at the documentation, it doesn't specify the shape of the input document. I assume it has key and value, but I cannot seem to locate any documentation with further detail about this. Would you be able to help me with some details of how this is implemented? I can then go about adding this to the PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean https://opensearch.org/docs/latest/api-reference/document-apis/bulk/? It's pretty complete (bulk API accepts line-delimited JSON).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean https://opensearch.org/docs/latest/api-reference/document-apis/bulk/? It's pretty complete (bulk API accepts line-delimited JSON).

Thank you for pointing out where this is. Should I link this in bulk.md as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does line-delimited JSON work? I tried running a sample using the available documentation, it doesn't seem to be working.

Here are the code snippets I am working with:

const { Client } = require('@opensearch-project/opensearch');
const documents = require('./docs.json')

try {
  const client = new Client({
    node: 'https://admin:admin@localhost:9200',
    ssl: { rejectUnauthorized: false }
  });
  client.helpers.bulk({
    datasource: documents,
    onDocument (doc) {
      return {
        index: { _index: 'example-index' }
      }
    }
  }).then((response) => {
    console.log(response)
  }).catch((error) => {
    console.error(error);
  })
} catch(e) {
  console.log("error", e)
}

docs.json

{ "delete": { "_index": "movies", "_id": "tt2229499" } }
{ "index": { "_index": "movies", "_id": "tt1979320" } }
{ "title": "Rush", "year": 2013 }
{ "create": { "_index": "movies", "_id": "tt1392214" } }
{ "title": "Prisoners", "year": 2013 }
{ "update": { "_index": "movies", "_id": "tt0816711" } }
{ "doc" : { "title": "World War Z" } }

error:

Unexpected non-whitespace character after JSON at position 57

Copy link
Collaborator

@nhtruong nhtruong Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AbhinavGarg90 Sorry for the very late response. I've been out of commission for the last few weeks.

The datasource is an iterable and the simplest iterable in this case is an array of objects, where each object is an OpenSearch doc:

  const docs = [{ title: 'Beauty and the Beast 1', year: 2050 },
                { title: 'Beauty and the Beast 2', year: 2051 }]

Then you can pass it to the helper function:

  let result = await client.helpers.bulk({ datasource: docs,
    onDocument(doc) {
      return { index: { _index: movies} }
    }
  });

  console.log(result);

And you should get this in the console:

{
  total: 2,
  failed: 0,
  retry: 0,
  successful: 2,
  noop: 0,
  time: 744,
  bytes: 150,
  aborted: false
}

Basically the client.helpers.bulk method, on top of implementing some batching logic, also acts as syntactical sugar for client.bulk. That is, the user only have to provide the documents in their original forms (an array of objects in the above example) and the action (index in this example) to apply to the documents.

@dblock
Copy link
Member

dblock commented Oct 17, 2023

sample

I think we need this to convince ourselves that the documentation is now correct.

Signed-off-by: Abhinav Garg <[email protected]>
@AbhinavGarg90
Copy link
Author

AbhinavGarg90 commented Oct 17, 2023

Hey @dblock I realized that I had created this PR from main on my fork. I want to move it to a feature branch on my fork. How can I do so? I can't seem to see the feature branch as an option.

@AbhinavGarg90 AbhinavGarg90 reopened this Oct 17, 2023
@nhtruong
Copy link
Collaborator

@AbhinavGarg90 You'd have to recreate the PR because the source and target branches are locked. However, for now, you can simply rebase your main branch onto the upstream/main branch, then force push, if you want the updates from upstream/main.

@nhtruong
Copy link
Collaborator

nhtruong commented Mar 4, 2024

The guide has been reworked.

@nhtruong nhtruong closed this Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants