Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed during dump collection cannot recover #552

Open
JPacks opened this issue Oct 13, 2016 · 3 comments
Open

Failed during dump collection cannot recover #552

JPacks opened this issue Oct 13, 2016 · 3 comments
Labels

Comments

@JPacks
Copy link

JPacks commented Oct 13, 2016

I am trying to sync mongodb replica to elasticsearch using mongo-connector. It works fine when I insert the first doc in my collection "check". But getting "Failed during dump collection cannot recover" error in mongo-connector.log during the second doc insertion. Due to this error, the second doc is getting loaded into an elasticsearch index.

The command I used is:
To start Mongo replica: sudo mongod --port 27017 --dbpath /_/__/_/** --replSet rs0
To start Mongo Connector: mongo-connector -m localhost:27017 -t localhost:9200 -d elastic_doc_manager --auto-commit-interval=0 -n a.check

Mongo-connector.log :
2016-10-13 17:27:45,381 [CRITICAL] mongo_connector.oplog_manager:630 - Exception during collection dump
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 583, in do_dump
upsert_all(dm)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 567, in upsert_all
dm.bulk_upsert(docs_to_dump(namespace), mapped_ns, long_ts)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 43, in wrapped
reraise(new_type, exc_value, exc_tb)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 32, in wrapped
return f(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic_doc_manager.py", line 214, in bulk_upsert
for ok, resp in responses:
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/init.py", line 160, in streaming_bulk
for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/init.py", line 89, in _process_bulk_chunk
raise e
ConnectionFailed: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host=u'localhost', port=9200): Read timed out. (read timeout=10))
2016-10-13 17:27:45,381 [ERROR] mongo_connector.oplog_manager:638 - OplogThread: Failed during dump collection cannot recover! Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, replicaset=u'rs0'), u'local'), u'oplog.rs')
2016-10-13 17:27:46,376 [ERROR] mongo_connector.connector:304 - MongoConnector: OplogThread <OplogThread(Thread-2, started 140648179619584)> unexpectedly stopped! Shutting down

FYI, I am using elasticsearch 2.3.1 ,mongodb 3.0.12 and mongo-connector 2.4.1

@ShaneHarvey
Copy link
Contributor

Looks like you are hitting a ReadTimeoutError on Elastic. Try increasing the timeout using a config file such as:

{
  "mainAddress": "localhost:27017",
  "verbosity": 3,
  "namespaces": {
    "include": ["a.check"]
  },
  "docManagers": [
    {
      "docManager": "elastic_doc_manager",
      "targetURL": "localhost:9200",
      "autoCommitInterval": 0,
      "args": {
        "clientOptions": {"timeout": 30}
      }
    }
  ]
}

You also can use the continueOnError option to force mongo-connector to log and ignore errors during the collection dump.

@mumlax
Copy link

mumlax commented Jan 19, 2017

I'm also running in this error suddenly, when doing a resync. It worked for a long time.

2017-01-19 12:43:52,690 [CRITICAL] mongo_connector.oplog_manager:666 - Exception during collection dump
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 621, in do_dump
    upsert_all(dm)
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 607, in upsert_all
    mapped_ns, long_ts)
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 44, in wrapped
    reraise(new_type, exc_value, exc_tb)
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 33, in wrapped
    return f(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic2_doc_manager.py", line 367, in bulk_upsert
    for ok, resp in responses:
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 162, in streaming_bulk
    for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 91, in _process_bulk_chunk
    raise e
ConnectionFailed: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host=u'localhost', port=9200): Read timed out. (read timeout=60))
2017-01-19 12:43:52,703 [ERROR] mongo_connector.oplog_manager:674 - OplogThread: Failed during dump collection cannot recover! Collection(Database(MongoClient(host=[u'localhost:27017'], document_class=dict, tz_aware=False, connect=True, replicaset=u'singleNodeRepl'), u'local'), u'oplog.rs')
2017-01-19 12:43:53,241 [ERROR] __main__:357 - MongoConnector: OplogThread <OplogThread(Thread-3, started 140353541756672)> unexpectedly stopped! Shutting down

I'm using mongo-connector version 2.5.0, pymongo version 3.4.0, MongoDB version 3.2.10 and elastic2_doc_manager version 0.3.0. I'm storing with this setup more than 100M documents.

I already raised the timeout to 60 like you can see in the log.

Previously, the following error appeared already so that I had to start the resync:

2017-01-19 08:58:36,553 [ERROR] mongo_connector.doc_managers.elastic2_doc_manager:412 - Exception while commiting to Elasticsearch
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic2_doc_manager.py", line 406, in commit
    successes, errors = bulk(self.elastic, action_buffer)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 190, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 162, in streaming_bulk
    for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 91, in _process_bulk_chunk
    raise e
ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host=u'localhost', port=9200): Read timed out. (read timeout=10))

Don't know if this affects the newest error.
Should I just set continueOnError? Are documents ignored (=>not synced), when an error appears and this option is set?

@ShaneHarvey
Copy link
Contributor

With continueOnError, documents that fail to sync during the collection dump period will be ignored. The general problem is that the Elasticsearch doc managers do not retry on connection/operation failure, see yougov/elastic2-doc-manager#18.

For now, I can only recommend increasing the Elasticsearch client timeout again. Do you see any errors or warnings in the Elasticsearch logs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants