-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quick implementation of the scroll feature #36
base: master
Are you sure you want to change the base?
Conversation
Hi @davinerd, thank you for your code! Can you provide some unit tests for this feature? |
Here we go. |
Hold on on this, I'm adding also the 'DELETE' feature to delete unused scroll id (as stated here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html#_clear_scroll_api) |
Good to go now. Added few more tests also. You can use the clear_scroll by doing:
it also supports multiple scroll ids as a list. |
any update on this? Thanks |
Hi, the tests are failing. Can you provide the correction? |
it's just one test that fails. However, my local test shows no error so I cannot reproduce it. Can you try locally? |
Hey @davinerd ipdb> response
HTTPResponse(_body=None,buffer=<_io.BytesIO object at 0x111f51d10>,code=400,effective_url='http://localhost:9200/_search/scroll?source=%7B%27scroll_id%27%3A+u%27cXVlcnlUaGVuRmV0Y2g7NTs2MTE6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTI6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTQ6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTM6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTU6aDVVZlZSbU9SakstVkhFcFVuc3ctdzswOw%3D%3D%27%7D&scroll=1m',error=HTTP 400: Bad Request,headers=<tornado.httputil.HTTPHeaders object at 0x11206e050>,reason='Bad Request',request=<tornado.httpclient.HTTPRequest object at 0x112062f10>,request_time=0.0036470890045166016,time_info={})
ipdb> response.body
'{"error":"ElasticSearchIllegalArgumentException[Failed to decode scrollId]; nested: IOException[Bad Base64 input character decimal 123 in array position 0]; ","status":400}'
ipdb> You can notice that the $ curl -XPOST "http://localhost:9200/_search/scroll?source=%7B%27scroll_id%27%3A+u%27cXVlcnlUaGVuRmV0Y2g7NTs2MTE6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTI6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTQ6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTM6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTU6aDVVZlZSbU9SakstVkhFcFVuc3ctdzswOw%3D%3D%27%7D&scroll=1m"
HTTP/1.1 400 Bad Request
Content-Type: application/json; charset=UTF-8
Content-Length: 172
{"error":"ElasticSearchIllegalArgumentException[Failed to decode scrollId]; nested: IOException[Bad Base64 input character decimal 123 in array position 0]; ","status":400} BUT it works if we pass them directly as curl -XGET "http://localhost:9200/_search/scroll?scroll_id=cXVlcnlUaGVuRmV0Y2g7NTs2MTE6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTI6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTQ6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTM6aDVVZlZSbU9SakstVkhFcFVuc3ctdzs2MTU6aDVVZlZSbU9SakstVkhFcFVuc3ctdzswOw==&scroll=1m"
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 7521
{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":13,"max_score":1.0,"hits":[{"_index":"teste","_type":"materia","_id":"http://localhost/noticia/12/fast","_score":1.0, "_source" : {"Subeditorias":["Campinas e Região","SP","São Paulo"] ... My suggestion: Something like this: @return_future
def scroll(self, scroll_id, scroll='5m', body=None, callback=None):
path = "/_search/scroll"
url = "{url}{path}?scroll={scroll}&scroll_id={scroll_id}".format(
url=self.url, path=path, scroll=scroll, scroll_id=scroll_id
)
request_arguments = dict(self.httprequest_kwargs)
request_arguments['method'] = "GET"
if body:
request_arguments['body'] = json_encode(body)
request = HTTPRequest(url, **request_arguments)
self.client.fetch(request, callback) |
Hey @davinerd |
Hi @cristianoliveira I'll implement your code and fixes, then retest it and submit the new code. Sorry for the delay. |
I've implemented the scroll feature.
Here's a quick stub on how to use this feature
Then to retrieve the scroll (you may need to loop the process until '_scroll_id' is found in the results):
It's not super elegant but it works.
Hope it helps