Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mongo-connector not reconnecting to elasticsearch after the ReadTimeoutError occurs #402

Open
kawaljeet opened this issue Mar 3, 2016 · 10 comments
Labels

Comments

@kawaljeet
Copy link

When the connectivity between mongo-connector and elasticsearch is broken for a short while, mongo-connector should reconnect automatically, but it is not.

I manually stopped the elastisearch instance, to break the connectivity. I got the following messages in the mongo-connector.log file (find the log messages in the end)

After 5 minutes, I restarted the elasticsearch instance. The elastic search services are running fine and it is listening to port 9200.

But mongo-connetor didnt reconnected it (after elastic search again became available). I also tried by updating the collection, but it didnt not reconnected to elasticsearch.

Is there any configuration or the run time parameter, such that it can reconnect to the elasticsearch instance ?

I have also put the configuration file in the end
Log message

2016-03-02 22:13:28,770 [WARNING] elasticsearch:82 - PUT http://10.9.220.233:9200/people/people/56d7657ea4dc6d2b7c055908?refresh=false [status:N/A request:10.005s]
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch-1.9.0-py2.7.egg/elasticsearch/connection/http_urllib3.py", line 78, in perform_request
response = self.pool.urlopen(method, url, body, retries=False, headers=self.headers, *_kw)
File "/usr/local/lib/python2.7/dist-packages/urllib3-1.14-py2.7.egg/urllib3/connectionpool.py", line 608, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python2.7/dist-packages/urllib3-1.14-py2.7.egg/urllib3/util/retry.py", line 224, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python2.7/dist-packages/urllib3-1.14-py2.7.egg/urllib3/connectionpool.py", line 558, in urlopen
body=body, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/urllib3-1.14-py2.7.egg/urllib3/connectionpool.py", line 380, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/usr/local/lib/python2.7/dist-packages/urllib3-1.14-py2.7.egg/urllib3/connectionpool.py", line 308, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
ReadTimeoutError: HTTPConnectionPool(host=u'10.9.220.233', port=9200): Read timed out. (read timeout=10)
2016-03-02 22:13:28,771 [ERROR] mongo_connector.util:87 - Fatal Exception
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/util.py", line 85, in wrapped
func(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/oplog_manager.py", line 261, in run
docman.upsert(doc, ns, timestamp)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/util.py", line 32, in wrapped
return f(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector-2.3.dev0-py2.7.egg/mongo_connector/doc_managers/elastic_doc_manager.py", line 143, in upsert
refresh=(self.auto_commit_interval == 0))
File "/usr/local/lib/python2.7/dist-packages/elasticsearch-1.9.0-py2.7.egg/elasticsearch/client/utils.py", line 69, in _wrapped
return func(_args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch-1.9.0-py2.7.egg/elasticsearch/client/init.py", line 263, in index
_make_path(index, doc_type, id), params=params, body=body)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch-1.9.0-py2.7.egg/elasticsearch/transport.py", line 307, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch-1.9.0-py2.7.egg/elasticsearch/connection/http_urllib3.py", line 86, in perform_request
raise ConnectionTimeout('TIMEOUT', str(e), e)
ConnectionTimeout: ConnectionTimeout caused by -
ReadTimeoutError(HTTPConnectionPool(host=u'10.9.220.233', port=9200): Read timed out. (read timeout=10))
.
find below the configuration file i am using

{
"comment": "Configuration options starting with '' are disabled",
"__comment
": "To enable them, remove the preceding '__'",
"mainAddress": "XX.XX.XX.XX:27017",
"oplogFile": "oplog.timestamp",
"noDump": false,
"batchSize": -1,
"verbosity": 1,
"continueOnError": true,
"logging": {
"type": "file",
"filename": "/root/mongoconnector.log",
"format": "%(asctime)s [%(levelname)s] %(name)s:%(lineno)d - %(message)s",
"rotationWhen": "D",
"rotationInterval": 1,
"rotationBackups": 10,
"__type": "syslog",
"__host": "localhost:514"
},
"namespaces": {
"include": [
"db.coll"
],
"mapping": {
"db.coll": "index.type"
}
},
"docManagers": [
{
"docManager": "elastic_doc_manager",
"targetURL": "XX.XX.XX.XX:9200",
"__bulkSize": 1000,
"__uniqueKey": "_id",
"autoCommitInterval": 120
}
]
}

@aherlihy
Copy link
Contributor

Hello! I just wanted to let you know I'm looking into this and should have an update for you soon. Thank you for your patience!

@kawaljeet
Copy link
Author

Sure @aherlihy .. thanks a lot for that. If you need any other information, do let me know. Right now i am running cron job to monitor the stale connection and then restarting the services.

To add on more on this.
After mongo connector and elastic servers are connected (both running on different servers)
a. Stop the elastic search server. I can still see the tcp state @ port 9200 as ESTABLISHED. Though the elastisearch is down,
b. Try updating the collection (elastic server still down). mongo connector will try to connect to elastic search server and now the tcp state changes to FIN_WAIT.. closing the connection, and it will throw errors. in the log files.
c. Start the elastic search server.
After c), my expectation was, the mongo connector should connect automatically. But it isnt.

@kawaljeet
Copy link
Author

Hi @aherlihy ... wanted to know, if you were able to reproduce the issue ?

@aherlihy
Copy link
Contributor

Hi @kawaljeet, I haven't found a solution to this issue yet. We are looking into how the elasticsearch driver deals with reconnecting and trying to find the best behavior to emulate.

To give you a timeline, because of the easter holidays there probably won't be any updates for the next week or so. Apologies for the delay!

@kawaljeet
Copy link
Author

Hi @aherlihy .. please let me know, if you have any update on same.
Thanks

@aherlihy aherlihy self-assigned this Apr 18, 2016
@aherlihy aherlihy added the bug label May 9, 2016
@pdspicer
Copy link

pdspicer commented Jun 7, 2016

Hi @aherlihy & @kawaljeet
I'm no python expert by any means, but I have experienced the same issue using elastic2 doc manager and after looking through this for a while, it seems as though the likely source of the issue is that the elastic2 doc manager's exception_wrapper is not working for subclasses of the wrapped error types, thus the error is not being handled by the wrapper and is causing the OplogThread to exit.

Additionally, it appears as though the Connector thread does not detect the OplogThread's unclean exit, as the only check it does in it's run loop is to see if the thread's "running" parameter is set to false, which it does not seem it will be if an unhanded exception occurs.

The result is that unhandled exceptions go unnoticed, and the process neither exits nor continues processing the oplog for the failed thread (which either represents all of the oplog or the oplog of a single shard).

The fix seems as though it should be twofold:

  1. Add a self.shard_set[shard_id].is_alive() call alongside the self.shard_set[shard_id].running check in connector.py
  2. Open tickets in elastic-doc-manager & elastic2-doc-manager for handling otherwise unhandled exceptions from elasticsearch-py, as a simple TimeoutError should not be sufficient to cause the thread to exit with an unhandled exception.

pdspicer pushed a commit to pdspicer/mongo-connector that referenced this issue Jun 8, 2016
…handle cases such as ReadTimeoutError
llvtt pushed a commit that referenced this issue Jun 9, 2016
#383 & #402, #451 fixes for error handling and graceful process exit
@llvtt
Copy link

llvtt commented Jun 9, 2016

@kawaljeet , would you mind trying against the latest version from master? I just accepted a patch from @pdspicer that allows the connector to catch additional exception types like ReadTimeoutError.

@kawaljeet
Copy link
Author

kawaljeet commented Aug 10, 2016

@llvtt .. Apologies for the late response. I have pulled up the new version (couple of hours back) and got the new mongo-connector. So, far things looks fine. But I need to monitor for some time (probably another couple of days). and will again share the feedback. Thanks a lot again.'
Thanks a lot @pdspicer for your help too.

@aherlihy aherlihy removed their assignment Aug 11, 2016
@kawaljeet
Copy link
Author

Hi @llvtt / @pdspicer , mongo-connector is responding better in ReadTimeoutError case now, Thanks for the fix. Though sometimes it hangs, i am printing the log file entries, when it again stopped responding. ( I mean, even though there is update in the collection, there is no update in the elastic).

mongo-connector.log

2016-08-23 11:51:35,507 [WARNING] elasticsearch:82 - DELETE http://XX.XXX.X.XX:9200/myindex/mytype/57b6202ce4b0370214fc90ba?refresh=false [status:N/A request:10.011s]
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 94, in perform_request
response = self.pool.urlopen(method, url, body, retries=False, headers=self.headers, *_kw)
File "/usr/local/lib/python2.7/dist-packages/urllib3-1.14-py2.7.egg/urllib3/connectionpool.py", line 608, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python2.7/dist-packages/urllib3-1.14-py2.7.egg/urllib3/util/retry.py", line 224, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python2.7/dist-packages/urllib3-1.14-py2.7.egg/urllib3/connectionpool.py", line 558, in urlopen
body=body, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/urllib3-1.14-py2.7.egg/urllib3/connectionpool.py", line 380, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/usr/local/lib/python2.7/dist-packages/urllib3-1.14-py2.7.egg/urllib3/connectionpool.py", line 308, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
ReadTimeoutError: HTTPConnectionPool(host=u'10.9.220.233', port=9200): Read timed out. (read timeout=10)
2016-08-23 11:51:35,508 [ERROR] mongo_connector.util:87 - Fatal Exception
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 85, in wrapped
func(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 244, in run
entry['o']['_id'], ns, timestamp)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 32, in wrapped
return f(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic2_doc_manager.py", line 254, in remove
refresh=(self.auto_commit_interval == 0))
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 69, in _wrapped
return func(_args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/init.py", line 722, in delete
doc_type, id), params=params)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 329, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 102, in perform_request
raise ConnectionTimeout('TIMEOUT', str(e), e)
ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host=u'10.9.220.233', port=9200): Read timed out. (read timeout=10))

mongo-connector.err

Exception in thread Thread-718:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 755, in run
self.function(_self.args, *_self.kwargs)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic2_doc_manager.py", line 293, in run_auto_commit
self.elastic.indices.refresh()
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 69, in _wrapped
return func(_args, params=params, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/indices.py", line 54, in refresh
'_refresh'), params=params)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 329, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 102, in perform_request
raise ConnectionTimeout('TIMEOUT', str(e), e)
ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host=u'10.9.220.233', port=9200): Read timed out. (read timeout=10))

Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in **bootstrap_inner
self.run()
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 85, in wrapped
func(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 244, in run
entry['o']['_id'], ns, timestamp)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 32, in wrapped
return f(_args, _kwargs)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic2_doc_manager.py", line 254, in remove
refresh=(self.auto_commit_interval == 0))
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 69, in _wrapped
return func(_args, params=params, _kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init
.py", line 722, in delete
doc_type, id), params=params)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 329, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 102, in perform_request
raise ConnectionTimeout('TIMEOUT', str(e), e)
ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host=u'10.9.220.233', port=9200): Read timed out. (read timeout=10))

@ShaneHarvey
Copy link
Contributor

The elastic2-doc-manager issue that tracks retrying operations on connection failure: yougov/elastic2-doc-manager#18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants