Question about tests & behavior of reference nodes

Question about tests & behavior of reference nodes

by Peter Krcmar -
Number of replies: 5

Hi all,

I'm having problems with tests that include large amount of nodes. The cause is the huge amount of traffic generated by the rumor-mongering mechanism (i.e. acking with status packets, forwarding new rumors, ...) which considerably slows everything down. As a last resort, I used Wireshark to check how the reference nodes managed to cope with such heavy traffic. To my surprise, the amount of messages was very small, and most importantly there seemed to be no status messages and nodes didn't seem to forward any new rumor. To confirm this I tried a simple chain setup (A <-> B <-> C <-> D) where A indexes a file, and indeed the PaxosPropose wasn't forwarded by B and no status was sent as ack.

So I just wanted to know if anyone managed to pass the tests while strictly treating the rumors with extras as any other rumor? If yes, then my implementation is most likely just bad and I'm sorry to have wasted your time. But if others are in the same situation as me then I feel like the tests were calibrated to work with the (optimized, but not following the specs) reference nodes but might not be compatible when run with nodes that strictly follow what was described in the homework handouts.

Thanks in advance for your responses :)

In reply to Peter Krcmar

Re: Question about tests & behavior of reference nodes

by Peter Krcmar -
I also noticed in the above example that A sends out 2 PaxosPrepare rumors: one to itself (with rumor id 1) and a second to B (with rumor id 2). Normally, B should drop and not respond to this out-of-order packet, but instead it sends back a PaxosPromise.

Just to note, the reason for this post is not at all to complain. I think we're already very lucky to have tests available for us and I appreciate all the hard work that the TAs put into this, even taking the time to write a reference implementation! I just wanted to share my findings with a) students that didn't pass the tests and are wondering why and b) the TAs just so that they are aware of this when grading, as I believe the tests are a part of the grade.

Best,
Peter
In reply to Peter Krcmar

Re: Question about tests & behavior of reference nodes

by Francesco Intoci -

Hi Peter,

I am too struggling with some tests, mostly the ones involving BinGossipers and also test number 15 (if I remember correctly, anyway the one with 21 nodes and several proposals from different nodes). Concerning your question, in order to lower the size of the traffic in the network, due to the fact that in all tests (I think) we are dealing with a fully connected topology, I have decided not to process Paxos messages as normal Rumors. I haven't figured out yet how effective my measure is, though, since I am not passing test 15.

In reply to Peter Krcmar

Re: Question about tests & behavior of reference nodes

by Elie Daou -

I've also been struggling with the same thing in relation to the Bin Gossipers. They seem to send propose messages as unicast instead of broadcast. In TestBinGossiper_No_Contention_Single_Consensus_Completion the reference node seems to send the prepare message as a rumor message to itself with ID 1 and send each of the other nodes the same prepare message but each with an incremented ID instead of the same one (since it should be the same rumor), they're all dropping the rumor since they're the first rumor they get from this node but have an ID bigger than 1. Are we supposed to increment the ID for the same rumor every time we send it or am I missing something?

In reply to Elie Daou

Re: Question about tests & behavior of reference nodes

by Pasindu Nivanthaka Tennage -

Hi  Elle,

Thank you for the question. 

As I already sent to Peter, yes, there seem to be an error in the overlay layer of our bin_gossiper. We are working on the issues right now. We will send you the changes soon.

Sorry for this inconvienience.


Regards

In reply to Peter Krcmar

Re: Question about tests & behavior of reference nodes

by Pasindu Nivanthaka Tennage -

Hi Peter,

Thank you for your question. Surprisingly, we found that the bin_gossiper (reference implementation) has the drawback you have pointed out.

We will fix these issues in the bin_gossiper and push to your repos ASAP.

Regards
TAs