Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure connection._ready() does not stall on bad file descriptor #885

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

seamusabshere
Copy link

@seamusabshere seamusabshere commented Nov 2, 2021

With @emk

We've observed "[Errno 2] No such file or directory" coming from add_writer(). When this happens, nobody notifies _waiter that there has been an exception and this hangs the program.

We haven't managed to create a local repo - whenever we (for example) kill -9 a postgres process while an aiopg client is writing to it, we get the intended psycopg2.OperationalError("Connection closed").

Here's the backtrace of the error seen in the wild:

Exception in callback Connection._ready(<weakref at 0...x7efddf120890>) at /root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/aiopg/connection.py:779
handle: <Handle Connection._ready(<weakref at 0...x7efddf120890>) at /root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/aiopg/connection.py:779>
Traceback (most recent call last):
[... app code ...]
File "/usr/lib/python3.7/asyncio/base_events.py", line 566, in run_until_complete
self.run_forever()
File "/usr/lib/python3.7/asyncio/base_events.py", line 534, in run_forever
self._run_once()
File "/usr/lib/python3.7/asyncio/base_events.py", line 1771, in _run_once
handle._run()
File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/aiopg/connection.py", line 838, in _ready
self._fileno, self._ready, weak_self # type: ignore
File "/usr/lib/python3.7/asyncio/selector_events.py", line 334, in add_writer
return self._add_writer(fd, callback, *args)
File "/usr/lib/python3.7/asyncio/selector_events.py", line 294, in _add_writer
(reader, handle))
File "/usr/lib/python3.7/selectors.py", line 389, in modify
self._selector.modify(key.fd, selector_events)
FileNotFoundError: [Errno 2] No such file or directory

What do these changes do?

Are there changes in behavior for the user?

Related issue number

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> (e.g. 588.bugfix)
    • if you don't have an issue_id change it to the pr id after creating the PR
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: Fix issue with non-ascii contents in doctest text files.

seamusabshere and others added 3 commits October 17, 2021 09:33
With @emk

We've observed "[Errno 2] No such file or directory" coming from add_writer(). When this happens, nobody notifies _waiter that there has been an exception and this hangs the program.

We haven't managed to create a local repo - whenever we (for example) kill -9 a postgres process while an aiopg client is writing to it, we get the intended psycopg2.OperationalError("Connection closed").

Here's the backtrace of the error seen in the wild:

Exception in callback Connection._ready(<weakref at 0...x7efddf120890>) at /root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/aiopg/connection.py:779
handle: <Handle Connection._ready(<weakref at 0...x7efddf120890>) at /root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/aiopg/connection.py:779>
Traceback (most recent call last):
  [... app code ...]
  File "/usr/lib/python3.7/asyncio/base_events.py", line 566, in run_until_complete
    self.run_forever()
  File "/usr/lib/python3.7/asyncio/base_events.py", line 534, in run_forever
    self._run_once()
  File "/usr/lib/python3.7/asyncio/base_events.py", line 1771, in _run_once
    handle._run()
  File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/aiopg/connection.py", line 838, in _ready
    self._fileno, self._ready, weak_self  # type: ignore
  File "/usr/lib/python3.7/asyncio/selector_events.py", line 334, in add_writer
    return self._add_writer(fd, callback, *args)
  File "/usr/lib/python3.7/asyncio/selector_events.py", line 294, in _add_writer
    (reader, handle))
  File "/usr/lib/python3.7/selectors.py", line 389, in modify
    self._selector.modify(key.fd, selector_events)
FileNotFoundError: [Errno 2] No such file or directory
this was causing sub-dependency version mismatch errors when updating to python 3.10
update async_timeout version to match upstream
aiopg/connection.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Nov 10, 2021

Codecov Report

Merging #885 (c77349e) into master (3b1902c) will decrease coverage by 0.11%.
The diff coverage is 60.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #885      +/-   ##
==========================================
- Coverage   93.32%   93.21%   -0.12%     
==========================================
  Files          12       12              
  Lines        1574     1577       +3     
  Branches      187      187              
==========================================
+ Hits         1469     1470       +1     
- Misses         73       75       +2     
  Partials       32       32              
Impacted Files Coverage Δ
aiopg/connection.py 95.34% <60.00%> (-0.33%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b1902c...c77349e. Read the comment docs.

Co-authored-by: Andrew Svetlov <[email protected]>
@seamusabshere
Copy link
Author

@asvetlov updated, thanks for your suggestion!

@seamusabshere
Copy link
Author

@asvetlov as i said in the description, i've found this to be impossible to test - can we get it merged without?

note that we have been running this in production since the original PR

@emk
Copy link

emk commented Jan 13, 2022

Thank you to everyone for following up on this!

I can confirm that this patch appears to have entirely fixed a bug where aiopg applications would hang indefinitely. We encountered the original bug at least daily (across various machines), and we haven't seen it once that I know of since deploying this fix.

I'm not quite sure how to test it without mocking core Python network code to artificially fail.

Here are the outstanding TODO items:

  • Unit tests for the changes exist: Very difficult to do without low-level failure mocking.
  • Documentation reflects the changes: Do we actually need any documentation for this?
  • Add a new news fragment into the CHANGES folder: I could take a look at writing this up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants