-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: Esys called in bad sequence #118
Comments
To add more information, we're using the following versions:
|
We have found the issue. Long story short: the threads are using the same context at the same time. The provider should protect the Esys operations or treat them as atomic operations. Let's take the random operations as an example. In the TSS function /* Check context, sequence correctness and set state to error for now */
if (esysContext == NULL) {
LOG_ERROR("esyscontext is NULL.");
return TSS2_ESYS_RC_BAD_REFERENCE;
}
r = iesys_check_sequence_async(esysContext);
if (r != TSS2_RC_SUCCESS)
return r;
esysContext->state = ESYS_STATE_INTERNALERROR; If the second thread starts its execution after the first one set this variable to ESYS_STATE_INTERNALERROR,
|
@Danigaralfo I implemented a mutex-based solution. Would you be able to test the https://github.com/tpm2-software/tpm2-openssl/tree/threads branch, please? |
Hello @gotthardp |
I've checked the commits and I saw that you used CRYPTO_RWLOCK mutex from OpenSSL. This would resolve the problem with threads, but with more processes created from the parent (APACHE behaves in this way with the As I said, tomorrow I will be able to test it. I'll test it using |
I've just tested the APACHE server with the patch that you provided. When talking about threads, the patch works. When talking about processes, It is still failing with the following errors: [Thu Dec 19 11:11:43.117649 2024] [ssl:info] [pid 4215:tid 4248] [client 192.168.100.163:54346] AH01964: Connection to child 257 established (server 127.0.1.1:443)
[Thu Dec 19 11:11:43.119102 2024] [ssl:debug] [pid 4215:tid 4248] ssl_engine_kernel.c(2421): [client 192.168.100.163:54346] AH02645: Server name not provided via TLS extension (using default/first virtual host)
[Thu Dec 19 11:11:43.119102 2024] [ssl:debug] [pid 4215:tid 4245] ssl_engine_kernel.c(2421): [client 192.168.100.171:56503] AH02645: Server name not provided via TLS extension (using default/first virtual host)
[Thu Dec 19 11:11:43.121404 2024] [ssl:info] [pid 4167:tid 4222] [client 192.168.100.171:56504] AH01964: Connection to child 134 established (server 127.0.1.1:443)
[Thu Dec 19 11:11:43.124046 2024] [ssl:debug] [pid 4167:tid 4222] ssl_engine_kernel.c(2421): [client 192.168.100.171:56504] AH02645: Server name not provided via TLS extension (using default/first virtual host)
DER DECODER DECODE
TSS2 DECODER DECODE 0x87
TSS2 DECODER LOAD parent: persistent 0x81000001
TSS2 DECODER DECODE 0x87
TSS2 DECODER LOAD parent: persistent 0x81000001
TSS2 DECODER DECODE found RSA
RSA LOAD
RSA GET_PARAMS [ bits security-bits max-size ]
RSA HAS 85
RSA HAS 2
RSA NEW
RSA IMPORT [ n e ]
RSA MATCH 0x6
RSA HAS 2
RSA MATCH 0x6
DER DECODER DECODE
RSA HAS 2
RSA MATCH 0x6
SIGN DIGEST_INIT rsa MD=SHA256
SIGN DIGEST_INIT rsa MD=SHA2-256
SIGN SET_CTX_PARAMS rsa [ pad-mode ]
SIGN SET_CTX_PARAMS rsa [ saltlen ]
SIGN DIGEST_SIGN estimate
SIGN DIGEST_SIGN
ERROR:tcti:src/util/io.c:114:write_all() failed to write to fd 3: Device or resource busy
ERROR:tcti:src/tss2-tcti/tcti-device.c:124:tcti_device_transmit() wrong number of bytes written. Expected 164, wrote 0.
ERROR:esys:src/tss2-esys/api/Esys_Hash.c:198:Esys_Hash_Async() Finish (Execute Async) ErrorCode (0x000a000a)
ERROR:esys:src/tss2-esys/api/Esys_Hash.c:78:Esys_Hash() Error in async function ErrorCode (0x000a000a)
[Thu Dec 19 11:11:43.221886 2024] [ssl:info] [pid 4167:tid 4222] [client 192.168.100.171:56504] AH02008: SSL library error 1 in handshake (server 127.0.1.1:443)
[Thu Dec 19 11:11:43.222001 2024] [ssl:info] [pid 4167:tid 4222] SSL Library Error: error:4000000E:tpm2::cannot hash (655370 tcti:IO failure)
[Thu Dec 19 11:11:43.222059 2024] [ssl:info] [pid 4167:tid 4222] SSL Library Error: error:0A080006:SSL routines::EVP lib
[Thu Dec 19 11:11:43.222093 2024] [ssl:info] [pid 4167:tid 4222] [client 192.168.100.171:56504] AH01998: Connection closed to child 134 with abortive shutdown (server 127.0.1.1:443)
[Thu Dec 19 11:11:43.230119 2024] [ssl:info] [pid 4167:tid 4230] [client 192.168.100.171:56505] AH01964: Connection to child 138 established (server 127.0.1.1:443)
[Thu Dec 19 11:11:43.230753 2024] [ssl:debug] [pid 4167:tid 4230] ssl_engine_kernel.c(2421): [client 192.168.100.171:56505] AH02645: Server name not provided via TLS extension (using default/first virtual host)
SIGN DIGEST_INIT rsa MD=SHA256
SIGN DIGEST_INIT rsa MD=SHA2-256
SIGN SET_CTX_PARAMS rsa [ pad-mode ]
SIGN SET_CTX_PARAMS rsa [ saltlen ]
SIGN DIGEST_SIGN estimate
SIGN DIGEST_SIGN
ERROR:esys:src/tss2-esys/esys_iutil.c:1145:iesys_check_sequence_async() Esys called in bad sequence.
ERROR:esys:src/tss2-esys/api/Esys_Hash.c:78:Esys_Hash() Error in async function ErrorCode (0x00070007)
[Thu Dec 19 11:11:43.250594 2024] [ssl:info] [pid 4167:tid 4230] [client 192.168.100.171:56505] AH02008: SSL library error 1 in handshake (server 127.0.1.1:443)
[Thu Dec 19 11:11:43.250656 2024] [ssl:info] [pid 4167:tid 4230] SSL Library Error: error:4000000E:tpm2::cannot hash (458759 esapi:Function called in the wrong order)
[Thu Dec 19 11:11:43.250704 2024] [ssl:info] [pid 4167:tid 4230] SSL Library Error: error:0A080006:SSL routines::EVP lib
[Thu Dec 19 11:11:43.250730 2024] [ssl:info] [pid 4167:tid 4230] [client 192.168.100.171:56505] AH01998: Connection closed to child 138 with abortive shutdown (server 127.0.1.1:443) The thing here is the line Maybe, instead of using mutexes, why don't use semaphores in shared memory? Allowing all (parent and forked) processes to access them. |
Hello,
We have encountered an issue in how the provider behaves when a process spawns two or more threads that access the TPM. In our case, this process was an Apache server that uses the TPM as a generator/storage for the private key. We have built the provider with debug mode enabled as well as activated the tpm2-tss traces.
Depending on the browser, this issue appears or not: with Firefox does not appear, but with Chrome/Edge does. This is because Firefox only sends one request, while Chrome/Edge send two (spawning three threads in the Apache process, one for key loading, and two for request processing).
You can see the logs in
chrome_logs.txt
. The flow would be as follows (In a general way. The order is not exactly like this always, as one thread could have started the TPM operations before the other one gets even started, but you get the point):This behaviour could also be observable in the signs operations. Seeing this, we thought that this could be a synchronization problem between the two threads accessing the TPM. This shouldn't be a problem, since the abrmd is a thing, but the daemon manages the access to the TPM at a process level (it uses the Dbus): it delegates the thread access to the developer.
In order to debug more, we tried to address the issue. We found some problems with the RAND operations order, so we ended up deactivating the rand operations in OpenSSL with the provider. We did as follows:
You can see the logs of this execution in
chrome_sign_mutex_logs.txt
. Now, the execution would go as follows (I will skip the rand generation since it is not being done in the TPM now):DIGEST_INIT
is done twice).SIGN DIGEST_SIGN estimate
is called, the mutex is freed once.SIGN DIGEST_SIGN
is done, the mutex is freed totally. Then, the other thread makes the same process.With this, the problem seems to be "solved", at least for the sign operations. We're looking into it more deeply, in order to make a better description of the problem, but some help may be useful.
Any suggestion is appreciated.
Thanks.
chrome_logs.txt
chrome_sign_mutex_logs.txt
The text was updated successfully, but these errors were encountered: