Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #167: Notify waiters when dropping a bad connection from the pool #194

Merged
merged 1 commit into from
Apr 5, 2024

Conversation

xortive
Copy link
Contributor

@xortive xortive commented Mar 25, 2024

No description provided.

Copy link

codecov bot commented Mar 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.38%. Comparing base (8de2610) to head (5805866).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #194      +/-   ##
==========================================
+ Coverage   74.42%   75.38%   +0.96%     
==========================================
  Files           6        6              
  Lines         524      516       -8     
==========================================
- Hits          390      389       -1     
+ Misses        134      127       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Owner

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable, thanks!

Can you clean up the warnings from your test code as pointed out by CI?

@xortive xortive force-pushed the malonso/issue-167 branch from 80d0a73 to 5805866 Compare March 27, 2024 16:19
@xortive
Copy link
Contributor Author

xortive commented Apr 1, 2024

This looks reasonable, thanks!

Can you clean up the warnings from your test code as pointed out by CI?

Sure thing, anything else before this is ready to merge from your perspective?

@djc djc merged commit cd5c510 into djc:main Apr 5, 2024
10 checks passed
@djc
Copy link
Owner

djc commented Apr 5, 2024

I'm still digging into your code and wondering if put_back() is the best place to notify the waiters? You mentioned that we should also do it in Reaper::run() -- so maybe we should move it into spawn_replenishing_approvals()? I guess that would wake up waiters in the PoolInner::get() loop, too, which might be weird.

In the end I wonder why notify_one() in PoolInternals::put() isn't enough -- that's called by both put_back() (for the success case) and add_connection(), which is ultimately what gets called during the replenishing process.

@xortive
Copy link
Contributor Author

xortive commented Apr 5, 2024

In the end I wonder why notify_one() in PoolInternals::put() isn't enough -- that's called by both put_back() (for the success case) and add_connection(), which is ultimately what gets called during the replenishing process.

The problem that this fixed for the particular test case provided, was all about the failure case in put_back(). When a bad connection is returned to the pool, while someone is waiting for a good connection, they still need to be notified of an available spot in the pool so they can go ahead and create a new connection to replace it.

A could imagine a similar scenario with the failure case in the reaper, where a bad connection is reaped instead of returned manually, and a waiter isn't notified of an available spot as a result.

@djc
Copy link
Owner

djc commented Apr 6, 2024

The problem that this fixed for the particular test case provided, was all about the failure case in put_back(). When a bad connection is returned to the pool, while someone is waiting for a good connection, they still need to be notified of an available spot in the pool so they can go ahead and create a new connection to replace it.

That doesn't quite make sense to me because in the put_back() failure case we already:

  • Called spawn_replenishing_approvals(), which
  • Calls replenish_idle_connections(), which
  • Calls add_connections(), which
  • Calls add_connection(), which
  • Calls PoolInternals::put(), which
  • Calls pool.notify.notify_one()

So why isn't that sufficient?

@vaikzs
Copy link

vaikzs commented Jun 3, 2024

@djc I noticed that the final call to spawn_replenishing_approvals(approvals) is incomplete due to empty approvals and returns after several attempts to put_back -- 5 connections are kept waiting. Due to this, the call stack doesn't reach pool.notify.notify_one() at the end and pool.notify.notify_one()

However, instead of adding pool.notify.notify_waiters() in put_back() which makes it redundant, replacing pool.notify.notify_one() in PoolInternals::put to pool.notify.notify_waiters() (no permit is stored to be used by the next call to notified().await) solves this. I tested with the same test case provided by @xortive

#[tokio::test]
async fn test_broken_connections_dont_starve_pool() {
    use std::sync::RwLock;
    use std::{convert::Infallible, time::Duration};

    #[derive(Default)]
    struct ConnectionManager {
        counter: RwLock<u16>,
    }
    #[derive(Debug)]
    struct Connection;

    #[async_trait::async_trait]
    impl bb8::ManageConnection for ConnectionManager {
        type Connection = Connection;
        type Error = Infallible;

        async fn connect(&self) -> Result<Self::Connection, Self::Error> {
            Ok(Connection)
        }

        async fn is_valid(&self, _: &mut Self::Connection) -> Result<(), Self::Error> {
            Ok(())
        }

        fn has_broken(&self, _: &mut Self::Connection) -> bool {
            let mut counter = self.counter.write().unwrap();
            let res = *counter < 5;
            *counter += 1;
            res
        }
    }

    let pool = bb8::Pool::builder()
        .max_size(5)
        .connection_timeout(Duration::from_secs(10))
        .build(ConnectionManager::default())
        .await
        .unwrap();

    let mut futures = Vec::new();

    for _ in 0..10 {
        let pool = pool.clone();
        futures.push(tokio::spawn(async move {
            let conn = pool.get().await.unwrap();
            drop(conn);
        }));
    }

    for future in futures {
        future.await.unwrap();
    }
}

@djc
Copy link
Owner

djc commented Jun 3, 2024

@vaikzs thanks for digging in! Would you be able to submit a PR, ideally including a test that tickles this behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants