Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quick implementation of the scroll feature #36

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

davinerd
Copy link

I've implemented the scroll feature.
Here's a quick stub on how to use this feature

result = yield self.es.search(index=query_index, type=query_type, source=body_query, scroll="1m")

Then to retrieve the scroll (you may need to loop the process until '_scroll_id' is found in the results):

body_query = {
"scroll_id" : result['_scroll_id']
}
result = yield self.es.search(index="_search", type="scroll", scroll="1m", source=body_query)

It's not super elegant but it works.
Hope it helps

@luizgpsantos
Copy link
Contributor

Hi @davinerd, thank you for your code!

Can you provide some unit tests for this feature?

@davinerd
Copy link
Author

Here we go.
Please check if the test is well done (I tried to be consistent with your style).

@davinerd
Copy link
Author

Hold on on this, I'm adding also the 'DELETE' feature to delete unused scroll id (as stated here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html#_clear_scroll_api)

@davinerd
Copy link
Author

Good to go now. Added few more tests also.

You can use the clear_scroll by doing:

yield self.es.clear_scroll(result['_scroll_id'])

it also supports multiple scroll ids as a list.

@davinerd
Copy link
Author

davinerd commented Oct 8, 2016

any update on this?

Thanks

@luizgpsantos
Copy link
Contributor

Hi, the tests are failing. Can you provide the correction?

@davinerd
Copy link
Author

davinerd commented Oct 9, 2016

it's just one test that fails. However, my local test shows no error so I cannot reproduce it. Can you try locally?

@cristianoliveira
Copy link
Contributor

cristianoliveira commented Nov 24, 2016

Hey @davinerd
I found the tests problem. The elasticsearch versions < 1.5 has trouble parsing the url query strings. Debugging it we've reached the follow problem:

ipdb> response
HTTPResponse(_body=None,buffer=<_io.BytesIO object at 0x111f51d10>,code=400,effective_url='http://localhost:9200/_search/scroll?source=%7B%27scroll_id%27%3A+u%27cXVlcnlUaGVuRmV0Y2g7NTs2MTE6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTI6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTQ6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTM6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTU6aDVVZlZSbU9SakstVkhFcFVuc3ctdzswOw%3D%3D%27%7D&scroll=1m',error=HTTP 400: Bad Request,headers=<tornado.httputil.HTTPHeaders object at 0x11206e050>,reason='Bad Request',request=<tornado.httpclient.HTTPRequest object at 0x112062f10>,request_time=0.0036470890045166016,time_info={})
ipdb> response.body
'{"error":"ElasticSearchIllegalArgumentException[Failed to decode scrollId]; nested: IOException[Bad Base64 input character decimal 123 in array position 0]; ","status":400}'
ipdb>

You can notice that the effective_url contains the ?source={"scroll_id": "SCROLL_ID"} encoded. Testing here using curl we get the same response:

$ curl -XPOST "http://localhost:9200/_search/scroll?source=%7B%27scroll_id%27%3A+u%27cXVlcnlUaGVuRmV0Y2g7NTs2MTE6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTI6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTQ6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTM6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTU6aDVVZlZSbU9SakstVkhFcFVuc3ctdzswOw%3D%3D%27%7D&scroll=1m"
HTTP/1.1 400 Bad Request
Content-Type: application/json; charset=UTF-8
Content-Length: 172

{"error":"ElasticSearchIllegalArgumentException[Failed to decode scrollId]; nested: IOException[Bad Base64 input character decimal 123 in array position 0]; ","status":400}

BUT it works if we pass them directly as /_search/scroll?scroll_id=SCROLL_ID&scroll=5m example:

curl -XGET "http://localhost:9200/_search/scroll?scroll_id=cXVlcnlUaGVuRmV0Y2g7NTs2MTE6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTI6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTQ6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTM6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTU6aDVVZlZSbU9SakstVkhFcFVuc3ctdzswOw==&scroll=1m"
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 7521

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":13,"max_score":1.0,"hits":[{"_index":"teste","_type":"materia","_id":"http://localhost/noticia/12/fast","_score":1.0, "_source" : {"Subeditorias":["Campinas e Região","SP","São Paulo"] ...

My suggestion:
Create a new method to deal with scroll search def scroll(scroll_id, scroll, ..) as elasticsearch.py does http://bit.ly/2gnhLXS and when creating the request pass it directly in querystring.

Something like this:

    @return_future
    def scroll(self, scroll_id,  scroll='5m', body=None, callback=None):
        path = "/_search/scroll"
        url = "{url}{path}?scroll={scroll}&scroll_id={scroll_id}".format(
            url=self.url, path=path, scroll=scroll, scroll_id=scroll_id
        )

        request_arguments = dict(self.httprequest_kwargs)
        request_arguments['method'] = "GET"

        if body:
            request_arguments['body'] = json_encode(body)

        request = HTTPRequest(url, **request_arguments)
        self.client.fetch(request, callback)

@cristianoliveira
Copy link
Contributor

Hey @davinerd
Do you have interest in continuing this PR?

@davinerd
Copy link
Author

Hi @cristianoliveira
yes, let me work on this in the next few days.

I'll implement your code and fixes, then retest it and submit the new code.

Sorry for the delay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants