Failed during dump collection cannot recover #552

JPacks · 2016-10-13T12:28:34Z

I am trying to sync mongodb replica to elasticsearch using mongo-connector. It works fine when I insert the first doc in my collection "check". But getting "Failed during dump collection cannot recover" error in mongo-connector.log during the second doc insertion. Due to this error, the second doc is getting loaded into an elasticsearch index.

The command I used is:
To start Mongo replica: sudo mongod --port 27017 --dbpath /_/__/_/** --replSet rs0
To start Mongo Connector: mongo-connector -m localhost:27017 -t localhost:9200 -d elastic_doc_manager --auto-commit-interval=0 -n a.check

Mongo-connector.log :
2016-10-13 17:27:45,381 [CRITICAL] mongo_connector.oplog_manager:630 - Exception during collection dump
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 583, in do_dump
upsert_all(dm)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 567, in upsert_all
dm.bulk_upsert(docs_to_dump(namespace), mapped_ns, long_ts)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 43, in wrapped
reraise(new_type, exc_value, exc_tb)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 32, in wrapped
return f(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic_doc_manager.py", line 214, in bulk_upsert
for ok, resp in responses:
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/init.py", line 160, in streaming_bulk
for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/init.py", line 89, in _process_bulk_chunk
raise e
ConnectionFailed: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host=u'localhost', port=9200): Read timed out. (read timeout=10))
2016-10-13 17:27:45,381 [ERROR] mongo_connector.oplog_manager:638 - OplogThread: Failed during dump collection cannot recover! Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, replicaset=u'rs0'), u'local'), u'oplog.rs')
2016-10-13 17:27:46,376 [ERROR] mongo_connector.connector:304 - MongoConnector: OplogThread <OplogThread(Thread-2, started 140648179619584)> unexpectedly stopped! Shutting down

FYI, I am using elasticsearch 2.3.1 ,mongodb 3.0.12 and mongo-connector 2.4.1

ShaneHarvey · 2016-10-13T18:18:33Z

Looks like you are hitting a ReadTimeoutError on Elastic. Try increasing the timeout using a config file such as:

{
  "mainAddress": "localhost:27017",
  "verbosity": 3,
  "namespaces": {
    "include": ["a.check"]
  },
  "docManagers": [
    {
      "docManager": "elastic_doc_manager",
      "targetURL": "localhost:9200",
      "autoCommitInterval": 0,
      "args": {
        "clientOptions": {"timeout": 30}
      }
    }
  ]
}

You also can use the continueOnError option to force mongo-connector to log and ignore errors during the collection dump.

mumlax · 2017-01-19T12:37:44Z

I'm also running in this error suddenly, when doing a resync. It worked for a long time.

2017-01-19 12:43:52,690 [CRITICAL] mongo_connector.oplog_manager:666 - Exception during collection dump
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 621, in do_dump
    upsert_all(dm)
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 607, in upsert_all
    mapped_ns, long_ts)
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 44, in wrapped
    reraise(new_type, exc_value, exc_tb)
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 33, in wrapped
    return f(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic2_doc_manager.py", line 367, in bulk_upsert
    for ok, resp in responses:
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 162, in streaming_bulk
    for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 91, in _process_bulk_chunk
    raise e
ConnectionFailed: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host=u'localhost', port=9200): Read timed out. (read timeout=60))
2017-01-19 12:43:52,703 [ERROR] mongo_connector.oplog_manager:674 - OplogThread: Failed during dump collection cannot recover! Collection(Database(MongoClient(host=[u'localhost:27017'], document_class=dict, tz_aware=False, connect=True, replicaset=u'singleNodeRepl'), u'local'), u'oplog.rs')
2017-01-19 12:43:53,241 [ERROR] __main__:357 - MongoConnector: OplogThread <OplogThread(Thread-3, started 140353541756672)> unexpectedly stopped! Shutting down

I'm using mongo-connector version 2.5.0, pymongo version 3.4.0, MongoDB version 3.2.10 and elastic2_doc_manager version 0.3.0. I'm storing with this setup more than 100M documents.

I already raised the timeout to 60 like you can see in the log.

Previously, the following error appeared already so that I had to start the resync:

2017-01-19 08:58:36,553 [ERROR] mongo_connector.doc_managers.elastic2_doc_manager:412 - Exception while commiting to Elasticsearch
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic2_doc_manager.py", line 406, in commit
    successes, errors = bulk(self.elastic, action_buffer)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 190, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 162, in streaming_bulk
    for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 91, in _process_bulk_chunk
    raise e
ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host=u'localhost', port=9200): Read timed out. (read timeout=10))

Don't know if this affects the newest error.
Should I just set continueOnError? Are documents ignored (=>not synced), when an error appears and this option is set?

ShaneHarvey · 2017-01-24T20:53:11Z

With continueOnError, documents that fail to sync during the collection dump period will be ignored. The general problem is that the Elasticsearch doc managers do not retry on connection/operation failure, see yougov/elastic2-doc-manager#18.

For now, I can only recommend increasing the Elasticsearch client timeout again. Do you see any errors or warnings in the Elasticsearch logs?

ShaneHarvey added the question label Oct 13, 2016

mumlax mentioned this issue Jan 24, 2017

Failed during dump collection and unavailable_shards_exception #633

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed during dump collection cannot recover #552

Failed during dump collection cannot recover #552

JPacks commented Oct 13, 2016

ShaneHarvey commented Oct 13, 2016

mumlax commented Jan 19, 2017 •

edited

Loading

ShaneHarvey commented Jan 24, 2017

Failed during dump collection cannot recover #552

Failed during dump collection cannot recover #552

Comments

JPacks commented Oct 13, 2016

ShaneHarvey commented Oct 13, 2016

mumlax commented Jan 19, 2017 • edited Loading

ShaneHarvey commented Jan 24, 2017

mumlax commented Jan 19, 2017 •

edited

Loading