You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @otoolep! I've been doing some cluster testing in my local network using three computers running rqlited, and a single instance of Mailpit. While ingesting a constant stream of incoming emails via Mailpit (thus writing as fast as rqlite can handle) I have been randomly dropping out one of the three rqlite nodes to test fault tolerance / recovery etc. rqlited seems to handle this very elegantly, but every now and then I get a panic from the gorqlite sql driver panic: runtime error: index out of range [0] with length 0:
ERRO[2024/04/14 16:04:21] [db] error storing message: tried all peers unsuccessfully. here are the results:
peer #0: http://192.168.7.2:4001/db/execute?timings&level=weak&transaction failed due to Post "http://192.168.7.2:4001/db/execute?timings&level=weak&transaction": dial tcp 192.168.7.2:4001: connect: connection refused
peer #1: http://192.168.7.50:4001/db/execute?timings&level=weak&transaction failed, got: 503 Service Unavailable, message: leader not found
peer #2: http://192.168.7.1:4001/db/execute?timings&level=weak&transaction failed, got: 503 Service Unavailable, message: leader not found
panic: runtime error: index out of range [0] with length 0
goroutine 376180 [running]:
github.com/rqlite/gorqlite.(*Connection).WriteOneParameterizedContext(...)
/home/ralph/.cache/go/pkg/mod/github.com/rqlite/[email protected]/write.go:79
github.com/rqlite/gorqlite/stdlib.(*Stmt).ExecContext(0xc008ce1410, {0x108d1b8, 0x1745480}, {0xc000e79860, 0xa, 0x1083460?})
/home/ralph/.cache/go/pkg/mod/github.com/rqlite/[email protected]/stdlib/sql.go:96 +0x3f8
database/sql.ctxDriverStmtExec({0x108d1b8, 0x1745480}, {0x108d730, 0xc008ce1410}, {0xc000e79860, 0xa, 0x108cb80?})
/usr/lib/go-1.22/src/database/sql/ctxutil.go:65 +0xa3
database/sql.resultFromStatement({0x108d1b8, 0x1745480}, {0x108cb80, 0xc00011c518}, 0xc000cf5438, {0xc000cf58b0, 0xa, 0xa})
/usr/lib/go-1.22/src/database/sql/sql.go:2670 +0x13a
database/sql.(*DB).execDC(0x0?, {0x108d1b8, 0x1745480}, 0xc0001cc750, 0x10?, {0xc0043c1ae0, 0x99}, {0xc000cf58b0, 0xa, 0xa})
/usr/lib/go-1.22/src/database/sql/sql.go:1722 +0x42c
database/sql.(*Tx).ExecContext(0xc007d56a80, {0x108d1b8, 0x1745480}, {0xc0043c1ae0, 0x99}, {0xc000cf58b0, 0xa, 0xa})
/usr/lib/go-1.22/src/database/sql/sql.go:2506 +0xad
database/sql.(*Tx).Exec(...)
/usr/lib/go-1.22/src/database/sql/sql.go:2515
...
exit status 2
I've tried to find the cause of the panic and I can clearly see where it is happening, just not why it is happening. WriteOne(), WriteOneContext(), WriteOneParameterized() and WriteOneParameterizedContext() are all potentially affected by this too as they all return wra[0], err (eg: here). If wra[0] does not exist (ie: empty slice) you will get a panic. I just can't work out what is causing wra to be empty as your error handling appears to append the error as a result. The only thing I can think of is if the final:
else {
returnresults, nil
}
... if results has no results... Hopefully this means more to you than me?
Ensuring the wra[0] exists before trying to return it is obviously the safest solution (though maybe not so elegant), and I do not know what the consequences of that are "down the food chain" (you're way more familiar with the code than I am).
The text was updated successfully, but these errors were encountered:
What I would guess Is happening is that this client library is not handling a leader election properly. When you remove a node randomly, and that node is the Leader, a new leader will be elected. That takes only a second or so, but doing a write during that time may result in the node responding to the kind of response you see above ("Leader not found").
What do you think the client library should do? Retry the write is probably the right thing to do, since it's a transient issue.
It sounds like you may be on to it - that brief time while nodes are electing a new leader. It seems to be there are two separate (but related) things here, how to deal with queries during the brief time during a leader election, and the fact that a populated return slice (wra[0]) is seemingly not guaranteed under such (and possibly other?) conditions.
I'm probably not qualified to answer the first - but a retry would be the most elegant (as long as it does not result in an endless loop of retries). As for the panic though, that should never happen as it literally kills the application through no fault of its own.
Hi @otoolep! I've been doing some cluster testing in my local network using three computers running rqlited, and a single instance of Mailpit. While ingesting a constant stream of incoming emails via Mailpit (thus writing as fast as rqlite can handle) I have been randomly dropping out one of the three rqlite nodes to test fault tolerance / recovery etc. rqlited seems to handle this very elegantly, but every now and then I get a panic from the gorqlite sql driver
panic: runtime error: index out of range [0] with length 0
:I've tried to find the cause of the panic and I can clearly see where it is happening, just not why it is happening.
WriteOne()
,WriteOneContext()
,WriteOneParameterized()
andWriteOneParameterizedContext()
are all potentially affected by this too as they allreturn wra[0], err
(eg: here). Ifwra[0]
does not exist (ie: empty slice) you will get a panic. I just can't work out what is causingwra
to be empty as your error handling appears to append the error as a result. The only thing I can think of is if the final:... if
results
has no results... Hopefully this means more to you than me?Ensuring the
wra[0]
exists before trying to return it is obviously the safest solution (though maybe not so elegant), and I do not know what the consequences of that are "down the food chain" (you're way more familiar with the code than I am).The text was updated successfully, but these errors were encountered: