You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran into this issue from a long running lua script. I'm still not able to reproduce this reliably but I was able to isolate the issue after attaching a debugger to the lua vm and many hours of assembly debugging(I didn't compile luasec with -g and it had -O2 on which made things harder).
During the bad state, the call stack looks roughly like this:
sslc.:231:lsec_socket_error // really just turns into a `WSAGetLastError` call
ssl.c:196:ssl_recv // (io->recv)
buffer.c:263:buffer_get
buffer.c:190:recvraw
buffer.c:109:buffer_meth_receive
The main issue is inside ssl_recv, given the following conditions:
when
err < 0
ssl->error == SSL_ERROR_SYSCALL
ERR_peek_error() == 0 (openssl's error queue is empty).
Under these conditions, WSAGetLastError gets called from lsec_socket_error. Now for whatever reason WSAGetLastError is actually returning 0, usually because it wasn't called right away after an error or maybe openssl called it first before luasec could. This is a problem because 0 == IO_DONE, ie. there was no error.
This results in recvraw getting stuck because it thinks no error has occurred but the recv'ed bytes is always count == 0, so it never makes progress.
I'm unsure of the appropriate fix for this. WSAGetLastError returning 0 usually means it got called too late from the time when the erroring function happened. But SSL_get_error and ERR_peek_error are the only calls made before WSAGetLastError. I couldn't find evidence that those two ssl functions call WSAGetLastError but I only took a cursory glance at the openssl source.
At any rate, lsec_socket_error definitely should not be return 0 in ssl_recv for that code path since that violates invariant: SSL_ERROR_SYSCALL happened but it's returning IO_DONE. Should add an assert here, eg.
I ran into this issue from a long running lua script. I'm still not able to reproduce this reliably but I was able to isolate the issue after attaching a debugger to the lua vm and many hours of assembly debugging(I didn't compile luasec with
-g
and it had-O2
on which made things harder).During the bad state, the call stack looks roughly like this:
The main issue is inside
ssl_recv
, given the following conditions:when
err < 0
ssl->error == SSL_ERROR_SYSCALL
ERR_peek_error() == 0
(openssl's error queue is empty).Under these conditions,
WSAGetLastError
gets called fromlsec_socket_error
. Now for whatever reasonWSAGetLastError
is actually returning0
, usually because it wasn't called right away after an error or maybe openssl called it first before luasec could. This is a problem because0 == IO_DONE
, ie. there was no error.This results in
recvraw
getting stuck because it thinks no error has occurred but the recv'ed bytes is alwayscount == 0
, so it never makes progress.I'm unsure of the appropriate fix for this.
WSAGetLastError
returning0
usually means it got called too late from the time when the erroring function happened. ButSSL_get_error
andERR_peek_error
are the only calls made beforeWSAGetLastError
. I couldn't find evidence that those two ssl functions callWSAGetLastError
but I only took a cursory glance at the openssl source.At any rate,
lsec_socket_error
definitely should not be return 0 inssl_recv
for that code path since that violates invariant:SSL_ERROR_SYSCALL
happened but it's returningIO_DONE
. Should add an assert here, eg.The text was updated successfully, but these errors were encountered: