-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] std::system_error thrown during/after client->subscribe(...) when using configuration setUnAckedMessagesTimeoutMs. #358
Comments
You can enable the debug level logs to verify your guess. See
|
I think it should be caused by
We should catch
Is there an easy way to simulate the case? I tried starting a consumer and then starting the Pulsar standalone locally but I cannot reproduce it. |
Please find example logs attached. |
I am not sure I see the link between the stacktrace and the line you think it could be caused by.
Unfortunately currently our code is within our application and I do not have a snippet or a standalone reproducible script to show you. The exception is seen around 20% of the time when I cycle the test. |
If the |
Fixes apache#358 Fixes apache#359 ### Motivation `async_wait` is not used correctly in some places. A callback that captures the `this` pointer or reference to `this` is passed to `async_wait`, if this object is destroyed when the callback is called, an invalid memory access will happen. ### Modifications Use the following pattern in all `async_wait` calls. ```c++ std::weak_ptr<T> weakSelf{shared_from_this()}; timer_->async_wait([weakSelf](/* ... */) { if (auto self = weakSelf.lock()) { self->foo(); } }); ```
I opened a PR that will close this issue. When you have time, you can test if that patch works. |
Fixes apache#358 Fixes apache#359 ### Motivation `async_wait` is not used correctly in some places. A callback that captures the `this` pointer or reference to `this` is passed to `async_wait`, if this object is destroyed when the callback is called, an invalid memory access will happen. ### Modifications Use the following pattern in all `async_wait` calls. ```c++ std::weak_ptr<T> weakSelf{shared_from_this()}; timer_->async_wait([weakSelf](/* ... */) { if (auto self = weakSelf.lock()) { self->foo(); } }); ```
Thank you for the prompt investigation and PR. |
The current behavior should be wrong. I will take a look soon. |
In my local env, it failed after 30s when I ran
The timeout is the @jato-sag Could you upload your client logs? |
I tried changing/reducing the operationTimeout, but I don't see any difference, subscribe does not return. Also for clarity, this is the rough code snippet. |
Fixes #358 Fixes #359 ### Motivation `async_wait` is not used correctly in some places. A callback that captures the `this` pointer or reference to `this` is passed to `async_wait`, if this object is destroyed when the callback is called, an invalid memory access will happen. ### Modifications Use the following pattern in all `async_wait` calls. ```c++ std::weak_ptr<T> weakSelf{shared_from_this()}; timer_->async_wait([weakSelf](/* ... */) { if (auto self = weakSelf.lock()) { self->foo(); } }); ```
Fixes #358 Fixes #359 ### Motivation `async_wait` is not used correctly in some places. A callback that captures the `this` pointer or reference to `this` is passed to `async_wait`, if this object is destroyed when the callback is called, an invalid memory access will happen. ### Modifications Use the following pattern in all `async_wait` calls. ```c++ std::weak_ptr<T> weakSelf{shared_from_this()}; timer_->async_wait([weakSelf](/* ... */) { if (auto self = weakSelf.lock()) { self->foo(); } }); ``` (cherry picked from commit 24ab12c)
Search before asking
Version
Pulsar version 3.3.
OS - Red Hat Enterprise Linux 8.9 and RHEL 9.3 and Linux 5.10.0-26-amd64
Minimal reproduce step
Our test that is observing the issue is when we are unable to connect to the server immediately on startup, but the server is reachable after a while. We attempt to subscribe, if the subscribe fails we try again.
Sometimes when the subscribe eventually succeeds all is well.
Sometimes when the subscribe is successful it still throws the exception.
Sometimes even before the subscribe returns there is the exception.
//Rough setup.
What did you expect to see?
No exception.
What did you see instead?
std::system_error thrown and not handled.
Anything else?
std::system_error thrown during/after client->subscribe(...) when using configuration setUnAckedMessagesTimeoutMs.
On attempt to subscribe and we have already set a value in the Client Configuration for setUnAckedMessagesTimeoutMs we are observing std::system_error being thrown, not handled, and not caught in ExecutorService and leads to terminating the application. setUnAckedMessagesTimeoutMs is set to 10000, as this is the minimum we have not experimented with other values.
Without setUnAckedMessagesTimeoutMs set in the configuration no exception is seen on or after subscribe(..).
We suspect the exception is being thrown by std::recursive_mutex when trying to aquire the lock in: UnAckedMessageTrackerEnabled::timeoutHandlerHelper() of pulsar-client-cpp/lib/UnAckedMessageTrackerEnabled.cc
stderr:
terminate called after throwing an instance of 'std::system_error'
what(): Invalid argument
backtrace:
Program terminated with signal SIGABRT, Aborted.
0 0x00007fdef2329acf in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7fddd1253700 (LWP 1875868))]
0 0x00007fdef2329acf in raise () from /lib64/libc.so.6
1 0x00007fdef22fcea5 in abort () from /lib64/libc.so.6
2 0x00007fdef68e69e3 in ::coreHandler(int, siginfo_t*, void*) () from /libapclient.so.10.15
3
4 0x00007fdef2329acf in raise () from /lib64/libc.so.6
5 0x00007fdef22fcea5 in abort () from /lib64/libc.so.6
6 0x00007fdef2eea09b in __gnu_cxx::__verbose_terminate_handler() [clone .cold.1] () from /lib64/libstdc++.so.6
7 0x00007fdef2ef054c in __cxxabiv1::__terminate(void ()()) () from /lib64/libstdc++.so.6
8 0x00007fdef2ef05a7 in std::terminate() () from /lib64/libstdc++.so.6
9 0x00007fdef2ef0808 in __cxa_throw () from /lib64/libstdc++.so.6
10 0x00007fdef2eec235 in std::__throw_system_error(int) [clone .cold.28] () from /lib64/libstdc++.so.6
11 0x00007fddd380f508 in pulsar::UnAckedMessageTrackerEnabled::timeoutHandlerHelper() () from /libconnectivity-pulsar-client.so
12 0x00007fddd380f5a9 in pulsar::UnAckedMessageTrackerEnabled::timeoutHandler() () from /libconnectivity-pulsar-client.so
13 0x00007fddd38111e2 in boost::asio::detail::wait_handler<pulsar::UnAckedMessageTrackerEnabled::timeoutHandler()::{lambda(boost::system::error_code const&) # 1}, boost::asio::any_io_executor>::do_complete(void, boost::asio::detail::scheduler_operation, boost::system::error_code const&, unsigned long) () from */libconnectivity-pulsar-client.so
14 0x00007fddd374ac38 in boost::asio::detail::scheduler::run(boost::system::error_code&) () from */libconnectivity-pulsar-client.so
15 0x00007fddd3743f92 in pulsar::ExecutorService::start()::{lambda() # 1}::operator()() const [clone .isra.334] () from */libconnectivity-pulsar-client.so
16 0x00007fdef2f1cb23 in execute_native_thread_routine () from /lib64/libstdc++.so.6
17 0x00007fdef26a81ca in start_thread () from /lib64/libpthread.so.0
18 0x00007fdef2314e73 in clone () from /lib64/libc.so.6
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: