Skip to content
This repository has been archived by the owner on Dec 17, 2018. It is now read-only.

WireConvertor fails if frame length > 1400 bytes #11

Open
allengeorge opened this issue Nov 16, 2013 · 2 comments
Open

WireConvertor fails if frame length > 1400 bytes #11

allengeorge opened this issue Nov 16, 2013 · 2 comments
Assignees
Labels
Milestone

Comments

@allengeorge
Copy link
Owner

Apparently the default value used for the WireConverter (1400-byte max frame size) is too low, and causes the RaftAgents to fail as follows:

WARN  [2013-11-16 17:43:23,703] io.libraft.agent.rpc.FinalUpstreamHandler: SERVER_02: caught exception - closing channel to null
! org.jboss.netty.handler.codec.frame.TooLongFrameException: Adjusted frame length exceeds 1400: 1428 - discarded
! at org.jboss.netty.handler.codec.frame.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:417) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.handler.codec.frame.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:405) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.handler.codec.frame.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:370) ~[netty-3.6.6.Final.jar:na]
! at io.libraft.agent.rpc.WireConverter$Decoder.decode(WireConverter.java:65) ~[libraft-agent/:na]
! at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) ~[netty-3.6.6.Final.jar:na]
! at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) ~[netty-3.6.6.Final.jar:na]
! at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [na:1.6.0_65]
! at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [na:1.6.0_65]
! at java.lang.Thread.run(Thread.java:695) [na:1.6.0_65]

This stack trace describes a follower that is unable to parse a message from the leader. It's unclear to me why only one follower has this happening.

@ghost ghost assigned allengeorge Nov 16, 2013
@allengeorge
Copy link
Owner Author

This happened with only one server because I was doing a lot of testing with a cluster with 'f' failures. When SERVER_02 rejoined the cluster, the leader attempted to catch it up. Since many, many entries had to be placed into a single packet, this caused the packet size to expand well past the 1400-byte limit.

This points to a bigger (known) issue with RaftAglorithm: it does not chunk AppendEntries into 'packet-size' chunks. This is partly because it has no idea what the serialized size of the packet is going to be. I don't think it's a problem to be solved at its level: I think it's up to the network layer to chunk it and send it out.

@allengeorge
Copy link
Owner Author

Currently I've mitigated this by changing the frame length to 10MB. This is a poor solution, and may point to failures in the interface design of RPCSender and RPCReceiver. Moreover, this requires a large number of copies to transfer data from one component into another, and out to the wire.

@allengeorge allengeorge added this to the 0.2.1 Release milestone Mar 25, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant