Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed during dump collection and unavailable_shards_exception #633

Closed
mumlax opened this issue Jan 24, 2017 · 2 comments
Closed

Failed during dump collection and unavailable_shards_exception #633

mumlax opened this issue Jan 24, 2017 · 2 comments

Comments

@mumlax
Copy link

mumlax commented Jan 24, 2017

As already mentioned in #552 (comment) I'm experiencing an "Exception during collection dump" because of a "ConnectionTimeout caused by ReadTimeoutError" (see logs and other details in linked comment).
I'm opening a new issue now, because a new circumstance appreared.
In the meantime, on some new tries to do a resync, I get a new error. Again an "Exception during collection dump" with the same Traceback, but at the end another reason.

OperationFailed: (u'611 document(s) failed to index.', [])

The array contains (I think 611) elements of the following form:

{u'index': {u'status': 503, u'_type': u'dbXY', u'_id': u'xyz', 
  u'error':  {u'reason': u'[indexAB][3] primary shard is not active Timeout: [1m], request: 
    [BulkShardRequest to [indexAB] containing [314] requests]', 
    u'type': u'unavailable_shards_exception'},
  u'_index': u'indexAB'}}

I double checked it, ElasticSearch is running.
Some time before I had played around with the BulkSize and set it to 3000. Because of the 'unavailable_shards_exception" I hoped to have found the reason and reset it to 1000. No improvement.
Sometimes the connector is stopped by the ReadTimeoutError, sometimes by the OperationFailed error, I can't detect a pattern when which error appears. I don't even know if these errors have the same source or if there are multiple problems...everything came with update to v2.5.0.

Any ideas on this problem(s)?

@ShaneHarvey
Copy link
Contributor

The problem is that the Elasticsearch doc managers do not retry on connection/operation failure. I'm closing this in favor of the elastic2-doc-manager issue that tracks retrying operations on failure: yougov/elastic2-doc-manager#18

@mumlax
Copy link
Author

mumlax commented Jan 30, 2017

Alright. I'll follow this issue. Thanks!
The reason for the long responsetime of ES was simple: diskspace was full, ooops ;) won't happen again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants