-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corrections to bulk.md #620
Conversation
Signed-off-by: Abhinav Garg <[email protected]>
Signed-off-by: Abhinav Garg <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think you could add a working sample into samples
like https://github.com/opensearch-project/opensearch-py/tree/main/samples as part of this PR?
That's entirely different from |
Should I try to update that bulk.md file to be closer in line to the ruby version ? |
fixed consistency issue in bulk.md Signed-off-by: Abhinav Garg <[email protected]>
You're welcome to. Or you can also create an issue in the Python Repo. |
guides/bulk.md
Outdated
@@ -52,18 +52,25 @@ client.bulk({ | |||
``` | |||
As you can see, each bulk operation is comprised of two objects. The first object contains the operation type and the target document's `_index` and `_id`. The second object contains the document's data. As a result, the body of the request above contains six objects for three index actions. | |||
|
|||
Alternatively, the `bulk` method can accept an array of hashes where each hash represents a single operation. The following code is equivalent to the previous example: | |||
Alternatively, the `bulk` method can accept an array of hashes where hashes are in pairs, except for delete which is singular. In each pair, the first item contains the action like `create` and, `_index` and `id` specify which index and document id the action has to be performed on. The second item in the pair is the document itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hash
, a very common data structure in Ruby, is actually referred to as an Object (as in JSON Object) in JavaScript.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will correct that now, I was apprehensive of it when I saw it earlier.
] | ||
}).then((response) => { | ||
console.log(response.body.items); | ||
}); | ||
``` | ||
|
||
We will use this format for the rest of the examples in this guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this part is not applicable to JavaScript. There's only way to format the body for client.bulk
in JS unlike in Ruby.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in we can remove line 55 to 76
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look like we have client.helpers.bulk
that allows another format if you wanna look into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in we can remove line 55 to 76
I don't follow exactly. Is that not how client.bulk works? Because I have used that same method across the document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should write a working sample, and include it along with these changes so we can be sure @AbhinavGarg90.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const { Client } = require('@opensearch-project/opensearch')
const documents = require('./docs.json')
const client = new Client({ ... })
const result = await client.helpers.bulk({
datasource: documents,
onDocument (doc) {
return {
index: { _index: 'example-index' }
}
}
})
console.log(result)
Looking at the documentation, it doesn't specify the shape of the input document. I assume it has key and value, but I cannot seem to locate any documentation with further detail about this. Would you be able to help me with some details of how this is implemented? I can then go about adding this to the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean https://opensearch.org/docs/latest/api-reference/document-apis/bulk/? It's pretty complete (bulk API accepts line-delimited JSON).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean https://opensearch.org/docs/latest/api-reference/document-apis/bulk/? It's pretty complete (bulk API accepts line-delimited JSON).
Thank you for pointing out where this is. Should I link this in bulk.md as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does line-delimited JSON work? I tried running a sample using the available documentation, it doesn't seem to be working.
Here are the code snippets I am working with:
const { Client } = require('@opensearch-project/opensearch');
const documents = require('./docs.json')
try {
const client = new Client({
node: 'https://admin:admin@localhost:9200',
ssl: { rejectUnauthorized: false }
});
client.helpers.bulk({
datasource: documents,
onDocument (doc) {
return {
index: { _index: 'example-index' }
}
}
}).then((response) => {
console.log(response)
}).catch((error) => {
console.error(error);
})
} catch(e) {
console.log("error", e)
}
docs.json
{ "delete": { "_index": "movies", "_id": "tt2229499" } }
{ "index": { "_index": "movies", "_id": "tt1979320" } }
{ "title": "Rush", "year": 2013 }
{ "create": { "_index": "movies", "_id": "tt1392214" } }
{ "title": "Prisoners", "year": 2013 }
{ "update": { "_index": "movies", "_id": "tt0816711" } }
{ "doc" : { "title": "World War Z" } }
error:
Unexpected non-whitespace character after JSON at position 57
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AbhinavGarg90 Sorry for the very late response. I've been out of commission for the last few weeks.
The datasource is an iterable and the simplest iterable in this case is an array of objects, where each object is an OpenSearch doc:
const docs = [{ title: 'Beauty and the Beast 1', year: 2050 },
{ title: 'Beauty and the Beast 2', year: 2051 }]
Then you can pass it to the helper function:
let result = await client.helpers.bulk({ datasource: docs,
onDocument(doc) {
return { index: { _index: movies} }
}
});
console.log(result);
And you should get this in the console:
{
total: 2,
failed: 0,
retry: 0,
successful: 2,
noop: 0,
time: 744,
bytes: 150,
aborted: false
}
Basically the client.helpers.bulk
method, on top of implementing some batching logic, also acts as syntactical sugar for client.bulk
. That is, the user only have to provide the documents in their original forms (an array of objects in the above example) and the action (index in this example) to apply to the documents.
Signed-off-by: Abhinav Garg <[email protected]>
I think we need this to convince ourselves that the documentation is now correct. |
Signed-off-by: Abhinav Garg <[email protected]>
Hey @dblock I realized that I had created this PR from main on my fork. I want to move it to a feature branch on my fork. How can I do so? I can't seem to see the feature branch as an option. |
@AbhinavGarg90 You'd have to recreate the PR because the source and target branches are locked. However, for now, you can simply rebase your main branch onto the |
The guide has been reworked. |
Description
The description on how to use the API was incorrect. I had updated it with the correct implementation.
Issues Resolved
#616
Check List
yarn run lint
doesn't show any errorsBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.