You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During testing, we observed a sudden increase of memory consumption by almost all of our logprep instances:
It turned out that we had configured a wrong TLS certificate for the Opensearch cluster, so that the OpensearchOutputConnector instances could not establish connections. This lead to various FatalOutputErorr exceptions (and following pipeline restarts):
2024-02-26 09:29:06,360 Logprep Pipeline 1 ERROR : FatalOutputError in OpensearchOutput (opensearch) - Opensearch Output: ['os-cluster.opensearch-prod']: ConnectionError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)) caused by: SSLError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006))
In our core system, we made a similar observation: The memory consumption of our logprep instances started to increase. Although the increasing demand was not as strong as observed in the test-system, some of the pods ran out of memory. We made this observation twice at two different points in time.
A short review revealed that we had some network/DNS issues at both occaisons . Our pipelines could not reach our Opensearch cluster, which lead to a lot of FatalOutputError exceptions and pipeline restarts:
2024-02-25 20:05:44,131 opensearch WARNING : GET https://prod-os-cluster.opensearch:9200/ [status:N/A request:10.008s]
Traceback (most recent call last):
File "/root/.pex/installed_wheels/f0b2b048d0941174a2abe3ab7a6f2b48844192abdba3aaadbe83e78983387f5d/urllib3-1.26.18-py2.py3-none-any.whl/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pex/installed_wheels/f0b2b048d0941174a2abe3ab7a6f2b48844192abdba3aaadbe83e78983387f5d/urllib3-1.26.18-py2.py3-none-any.whl/urllib3/util/connection.py", line 72, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/.pex/installed_wheels/e55d5dac054d07afab930a0d5f3de8475381721e9eca3728fbdda611fa0ed070/opensearch_py-2.4.2-py2.py3-none-any.whl/opensearchpy/connection/http_urllib3.py", line 264, in perform_request
response = self.pool.urlopen(
^^^^^^^^^^^^^^^^^^
File "/root/.pex/installed_wheels/f0b2b048d0941174a2abe3ab7a6f2b48844192abdba3aaadbe83e78983387f5d/urllib3-1.26.18-py2.py3-none-any.whl/urllib3/connectionpool.py", line 799, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/root/.pex/installed_wheels/f0b2b048d0941174a2abe3ab7a6f2b48844192abdba3aaadbe83e78983387f5d/urllib3-1.26.18-py2.py3-none-any.whl/urllib3/util/retry.py", line 525, in increment
raise six.reraise(type(error), error, _stacktrace)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pex/installed_wheels/f0b2b048d0941174a2abe3ab7a6f2b48844192abdba3aaadbe83e78983387f5d/urllib3-1.26.18-py2.py3-none-any.whl/urllib3/packages/six.py", line 770, in reraise
raise value
File "/root/.pex/installed_wheels/f0b2b048d0941174a2abe3ab7a6f2b48844192abdba3aaadbe83e78983387f5d/urllib3-1.26.18-py2.py3-none-any.whl/urllib3/connectionpool.py", line 715, in urlopen
httplib_response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/root/.pex/installed_wheels/f0b2b048d0941174a2abe3ab7a6f2b48844192abdba3aaadbe83e78983387f5d/urllib3-1.26.18-py2.py3-none-any.whl/urllib3/connectionpool.py", line 404, in _make_request
self._validate_conn(conn)
File "/root/.pex/installed_wheels/f0b2b048d0941174a2abe3ab7a6f2b48844192abdba3aaadbe83e78983387f5d/urllib3-1.26.18-py2.py3-none-any.whl/urllib3/connectionpool.py", line 1058, in _validate_conn
conn.connect()
File "/root/.pex/installed_wheels/f0b2b048d0941174a2abe3ab7a6f2b48844192abdba3aaadbe83e78983387f5d/urllib3-1.26.18-py2.py3-none-any.whl/urllib3/connection.py", line 363, in connect
self.sock = conn = self._new_conn()
^^^^^^^^^^^^^^^^
File "/root/.pex/installed_wheels/f0b2b048d0941174a2abe3ab7a6f2b48844192abdba3aaadbe83e78983387f5d/urllib3-1.26.18-py2.py3-none-any.whl/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f246a56ddd0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution
2024-02-25 20:05:44,134 Logprep Pipeline 14 ERROR : FatalOutputError in OpensearchOutput (opensearch) - Opensearch Output: ['os-cluster.opensearch']: ConnectionError(<urllib3.connection.HTTPSConnection object at 0x7f246a56ddd0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution) caused by: NewConnectionError(<urllib3.connection.HTTPSConnection object at 0x7f246a56ddd0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution)
It seems that when any of the above exceptions occur, and pipelines need to be re-started, something is not completely freed, which leads to the increasing amount of memory used. But it is not completely clear what is causing the memory issues here.
Expected behavior
Occurence of the above exceptions and/or pipeline re-starts should not cause logprep to consume more memory.
During testing, we observed a sudden increase of memory consumption by almost all of our logprep instances:
It turned out that we had configured a wrong TLS certificate for the Opensearch cluster, so that the
OpensearchOutputConnector
instances could not establish connections. This lead to variousFatalOutputErorr
exceptions (and following pipeline restarts):In our core system, we made a similar observation: The memory consumption of our logprep instances started to increase. Although the increasing demand was not as strong as observed in the test-system, some of the pods ran out of memory. We made this observation twice at two different points in time.
A short review revealed that we had some network/DNS issues at both occaisons . Our pipelines could not reach our Opensearch cluster, which lead to a lot of
FatalOutputError
exceptions and pipeline restarts:It seems that when any of the above exceptions occur, and pipelines need to be re-started, something is not completely freed, which leads to the increasing amount of memory used. But it is not completely clear what is causing the memory issues here.
Expected behavior
Occurence of the above exceptions and/or pipeline re-starts should not cause logprep to consume more memory.
Environment
Logprep version: 2b16c19
Python version: 3.11
The text was updated successfully, but these errors were encountered: