WireConvertor fails if frame length > 1400 bytes #11

allengeorge · 2013-11-16T17:57:21Z

Apparently the default value used for the WireConverter (1400-byte max frame size) is too low, and causes the RaftAgents to fail as follows:

WARN  [2013-11-16 17:43:23,703] io.libraft.agent.rpc.FinalUpstreamHandler: SERVER_02: caught exception - closing channel to null
! org.jboss.netty.handler.codec.frame.TooLongFrameException: Adjusted frame length exceeds 1400: 1428 - discarded
! at org.jboss.netty.handler.codec.frame.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:417) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.handler.codec.frame.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:405) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.handler.codec.frame.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:370) ~[netty-3.6.6.Final.jar:na]
! at io.libraft.agent.rpc.WireConverter$Decoder.decode(WireConverter.java:65) ~[libraft-agent/:na]
! at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) ~[netty-3.6.6.Final.jar:na]
! at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [na:1.6.0_65]
! at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [na:1.6.0_65]
! at java.lang.Thread.run(Thread.java:695) [na:1.6.0_65]

This stack trace describes a follower that is unable to parse a message from the leader. It's unclear to me why only one follower has this happening.

allengeorge · 2013-11-16T19:37:00Z

This happened with only one server because I was doing a lot of testing with a cluster with 'f' failures. When SERVER_02 rejoined the cluster, the leader attempted to catch it up. Since many, many entries had to be placed into a single packet, this caused the packet size to expand well past the 1400-byte limit.

This points to a bigger (known) issue with RaftAglorithm: it does not chunk AppendEntries into 'packet-size' chunks. This is partly because it has no idea what the serialized size of the packet is going to be. I don't think it's a problem to be solved at its level: I think it's up to the network layer to chunk it and send it out.

allengeorge · 2013-11-25T18:52:30Z

Currently I've mitigated this by changing the frame length to 10MB. This is a poor solution, and may point to failures in the interface design of RPCSender and RPCReceiver. Moreover, this requires a large number of copies to transfer data from one component into another, and out to the wire.

ghost assigned allengeorge Nov 16, 2013

allengeorge mentioned this issue Nov 21, 2013

Observing transient timeouts during KayVee distributed-store healthcheck #14

Closed

allengeorge added this to the 0.2.1 Release milestone Mar 25, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WireConvertor fails if frame length > 1400 bytes #11

WireConvertor fails if frame length > 1400 bytes #11

allengeorge commented Nov 16, 2013

allengeorge commented Nov 16, 2013

allengeorge commented Nov 25, 2013

WireConvertor fails if frame length > 1400 bytes #11

WireConvertor fails if frame length > 1400 bytes #11

Comments

allengeorge commented Nov 16, 2013

allengeorge commented Nov 16, 2013

allengeorge commented Nov 25, 2013