Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing connection with multiple hosts and unavailable replicas #579

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

r-dmv
Copy link

@r-dmv r-dmv commented Jul 11, 2019

What do these changes do?

Since PostgreSQL 10 it's possible to describe multiple hosts in connection string, it is useful when you have HA cluster of PostgreSQL with several replicas and you don't want to have some kind of balancer over your replicas.

And know it works in aiopg if you're using libpq =< 10.

But if there is a dead or unavailable replica in your connection string before your target host, exception is raised after timeout: psycopg2.OperationalError: asynchronous connection attempt underway.
It is also happens when you have host with multiple ip addresses in DNS, and first one is not responding.

You can simply reproduce this behavior by setting up 1 PostgreSQL server on 127.0.0.1:5432 and try to connect with connection string:
dbname=<db> user=<user> password=<password> host=8.8.8.8,127.0.0.1 port=5433,5432 target_session_attrs=read-write connect_timeout=1

In sync psycopg2 connection it works properly, but this code:

import asyncio
import aiopg

dsn = 'dbname=<db> user=<user> password=<password> host=8.8.8.8,127.0.0.1 port=5433,5432 target_session_attrs=read-write connect_timeout=1'


async def go():
    async with aiopg.create_pool(dsn, timeout=3) as pool:
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                await cur.execute("SELECT 1")
                ret = []
                async for row in cur:
                    ret.append(row)
                assert ret == [(1,)]

loop = asyncio.get_event_loop()
loop.run_until_complete(go())

raises psycopg2.OperationalError: asynchronous connection attempt underway.

This happens because libpq creates new connection, and closes failed, but we are waiting a failed one in _poll method and fails after timeout.

Are there changes in behavior for the user?

Now aiopg uses connect_timeout param from dsn.
It is used to timeout connection to the single host, in order to prevent getting stuck on connection to host which is not responding at all, even on first SYN packet.

So now, if you have 3 hosts in your connection string, you should provide connect_timeout = timeout / 3 just as guarantee that aiopg will try all 3 host during the timeout.

# we will wait single host for 1 second and whole connect operation will take 3 seconds at worst case
pool = aiopg.create_pool('host=localhost ... connect_timeout=1', timeout=3)

You can also use connect_timeout in kwargs:

pool = aiopg.create_pool('host=localhost ...', timeout=3, connect_timeout=1)

Related issue number

#275

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • [] Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> (e.g. 588.bugfix)
    • if you don't have an issue_id change it to the pr id after creating the PR
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: Fix issue with non-ascii contents in doctest text files.

@codecov
Copy link

codecov bot commented Jul 11, 2019

Codecov Report

Merging #579 into master will increase coverage by 0.05%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #579      +/-   ##
==========================================
+ Coverage   94.35%   94.41%   +0.05%     
==========================================
  Files          27       27              
  Lines        3740     3780      +40     
  Branches      171      171              
==========================================
+ Hits         3529     3569      +40     
  Misses        179      179              
  Partials       32       32
Impacted Files Coverage Δ
tests/test_connection.py 98.75% <100%> (+0.11%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 82f3ced...9b8e5ae. Read the comment docs.

# If there is an error, we'll get it from poll.
self = weak_self()

if self._conn.status == CONN_STATUS_CONNECTING:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This occasionally fails due to self being None.

Minimal fix would be to skip the rest when that is the case,

but generally there shouldn't be active timers for a deleted object.

Comment on lines +36 to +38
# In socket.socket we should know type and family to shutdown socket by fd
# This function is used for shutdown libpq connection
# where family and type is unknown
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the problem with just using the socket object?

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Dmitry Rubtsov seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants