Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSAGetLastError's return 0 can cause infinite loop in 'recv*' functions #100

Open
greatwolf opened this issue Apr 19, 2017 · 0 comments
Open

Comments

@greatwolf
Copy link
Contributor

greatwolf commented Apr 19, 2017

I ran into this issue from a long running lua script. I'm still not able to reproduce this reliably but I was able to isolate the issue after attaching a debugger to the lua vm and many hours of assembly debugging(I didn't compile luasec with -g and it had -O2 on which made things harder).

During the bad state, the call stack looks roughly like this:

sslc.:231:lsec_socket_error // really just turns into a `WSAGetLastError` call
ssl.c:196:ssl_recv // (io->recv)
buffer.c:263:buffer_get
buffer.c:190:recvraw
buffer.c:109:buffer_meth_receive

The main issue is inside ssl_recv, given the following conditions:

when

  • err < 0
  • ssl->error == SSL_ERROR_SYSCALL
  • ERR_peek_error() == 0 (openssl's error queue is empty).

Under these conditions, WSAGetLastError gets called from lsec_socket_error. Now for whatever reason WSAGetLastError is actually returning 0, usually because it wasn't called right away after an error or maybe openssl called it first before luasec could. This is a problem because 0 == IO_DONE, ie. there was no error.

This results in recvraw getting stuck because it thinks no error has occurred but the recv'ed bytes is always count == 0, so it never makes progress.

I'm unsure of the appropriate fix for this. WSAGetLastError returning 0 usually means it got called too late from the time when the erroring function happened. But SSL_get_error and ERR_peek_error are the only calls made before WSAGetLastError. I couldn't find evidence that those two ssl functions call WSAGetLastError but I only took a cursory glance at the openssl source.

At any rate, lsec_socket_error definitely should not be return 0 in ssl_recv for that code path since that violates invariant: SSL_ERROR_SYSCALL happened but it's returning IO_DONE. Should add an assert here, eg.

// ...
  if (err == 0)
    return IO_CLOSED;

  err = lsec_socket_error();
  assert (err != 0);
  return err;
default:
  return LSEC_IO_SSL;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant