-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for #24805: Do not unenlist XA transactions if related to the current Tx #25131
base: master
Are you sure you want to change the base?
Fix for #24805: Do not unenlist XA transactions if related to the current Tx #25131
Conversation
@OndroMih Thank you, I will test it in the coming few days. |
@OndroMih First tests seem to result in the same exception as before: NPE in ConnectorXAResource.
I will do some more tests in the coming two days. |
@OndroMih I changed your fix to add some info logging:
I can find:
but there are no cases logged with Level of order: about 658 calls to isLocalResourceInTransaction before my test fails. |
Resource is in Tx if it's the non-XA resource attached to the Tx OR if it's attached to the poolInto that is attached to the Tx. Maybe it's enough to keep only the second condition, it seems that non-XA resources are also in the poolInfo but not sure.
I looked at my code again and I got the logical condition wrong. It should be OR and not AND. The logic is: local resource is in transaction if it's a non-XA resource associated with the Tx, I think it would be enough to keep only the new condition @escay, can you try again with my latest change? |
@OndroMih I will try it, but I wonder if it would help if all calls I see have
but let me try. Note I only see 3 occurrences logged with nonXAResourceInTransaction: false:
and this is logged before our application really started / before "Loading application .. done in .. ms" message.
|
@OndroMih Your change certainly seems to help 😍 I ran my test set 4 times in a row without redeployment: all ok.
Code I used locally:
|
@OndroMih I ran a bigger test set. About 380.000 calls so far to isLocalResourceInTransaction and everything still ok. |
Thank you for your quick feedback, @escay. I'm glad this fixes the issue. Yes, I'll turn this into a proper pull request now. |
The build failed twice in a row in an Ant step "execute-sql-common" on error connecting to Derby DB from Ant script. Anybody knows why it could be? |
Locally the test suite failed too, when I executed
I ran the test suite without this change and the tests passed locally. It's either a side-effect of this fix, or a bug in Derby triggered by this fix. |
I can reproduce the error locally, if I run First failure I see is in: test I plan to try and understand the test case. I have not seen NullPointerExceptions in the server.log |
Single test reproduction also succeeds.
and run ./runtests.sh jdbc_all Test hangs on getting a connection from the pool. |
The test fails in
Without the fix: all 101 real database connection resources are the same, so eventually all 101 calls use the same database connection. With the fix: all real database connection resources are also the same, but after call 31 (32 calls in total?) exception occurs:
Additional info: the datasource: "jdbc-connsharing-pool" does not seem to be initialized with a certain max pool size.
Max pool size default is configured in the code to be 32:
This is in line with the failing step. |
Regarding test6 failure, the logic in Looking at the code comments, resource optimization was only ever implemented for the first resource in the transaction:
and code confirms this behavior, on enlistment of a resource in the transaction only the 1st item is set as the nonXAResource, while the transaction can consist of more resources in the tran.getResources() collection:
So based on this code I think the test6 could be wrong: 100 calls are not supposed to work as long as connection 1 is still occupied. It would only work if connection1 could be reused. If I change the test to
the test succeeds, bean.test3() now opens the first resource an can now reuse it due to the JavaEETransactionManagerSimplified logic and only 1 ResourceHandle is used. ==== When I try to lift the limitation resource optimization to apply to all resources in the transaction I looked at supporting getResourceFromTransaction for the second resource in the transaction. the logic in 'ConnectionPool.getResourceFromTransaction`
==== I wrote earlier: I also tried in 2018 to figure out how getConnection should work according to the Jakarta specification, because I saw different behavior in different application servers. I would love to see a pointer to the specification definitions explaining what to expect in the most basic REQUIRED transaction situation. |
Hopefully fixes #24805.
I think the problem with #24805 is:
I suspect that sometimes XA resources are closed, which unenlists them, and they are reused by another transaction before the previous transaction completes and delists the resources. Therefore a new transaction uses a resources, which is already marked as enlisted and doesn't attempt to enlist it again. This also means that the new transaction doesn't call the enlistResource method, and doesn't set the non-XA resource and keeps it null. This leads to a NullPtr later.
The NullPtr happens very rarely, I think only if a new transaction asks for the resource between the time when it's closed and previous Tx is completed, and only if the new transaction contains this single resource. Then no other resource calls the enlistResource method and the non-XA resource remains null, which leads to an exception when delisting the single resource in the new transaction.
This PR should keep all XA resources with the transaction if they are still associated with the transaction, and then delist all the resources at the end of the transaction. Otherwise resources are prematurely removed from Tx before Tx commits and delists them.
@escay, can you please try this?