CS-438: Consistency of test 4; retransmission of rumors with prepares

I am finding it difficult to see how test 4 is valid; as far as i can tell, it will only be valid if node A ends up mongering both of the propose messages to any of the stopped nodes, in both tries.

As an example, with made-up rumor IDs to keep track of the messages

A prepares (rumor ID 1) => gets mongered with a stopped node (B | C | D) (# prepares sent by A = 1)
Paxos retry timeout occurs => A re-prepares (new rumor, rumor ID 2) => gets mongered with E (# prepares sent by A = 2)
E exchanges status information with A => A resends rumor ID 1 to E (# prepares sent by A = 3)

Therefore differing from the expected 2 prepares sent by A.

The above disregards the fact that even more retransmissions may occur due to rumor retransmission of unack'ed rumors (dependning on ones acknowledgement retransmission timeout).

The assignment states that the watcher should work as before so i assume that we are not supposed to add special conditions for only notifying of the initial injection of the prepare (which would probably solve the issue).

- Morten

Re: Consistency of test 4; retransmission of rumors with prepares

by Stratos Triantafyllou - Saturday, 28 November 2020, 14:36

Agreed.

If I'm not missing something, the rumor originating in the proposer can be sent to an arbitrary number of neighbors until the paxos retry timeout occurs, e.g. if we get "heads" in the coin toss after exchanging status messages with E.

However, and please correct me if I'm wrong, I have the idea that the reference gossiper in the binary only sends the ExtraMessage with its first transmission of a rumor. For example, when sending the prepare message on a rumor with ID 1, if it needs to send this rumor again to another peer, there is no ExtraMessage on it (?).

Re: Consistency of test 4; retransmission of rumors with prepares

by Francesco Intoci - Saturday, 28 November 2020, 15:26

Moreover, after ,for example, 1 second, assuming that nodeA will retry after 1 second, won't we have sent numParticipant prepare messages, since we broadcast our prepare, instead of just 1? (Assuming we don't change the behaviour of the Watcher logic)

Re: Consistency of test 4; retransmission of rumors with prepares

by Pasindu Nivanthaka Tennage - Monday, 30 November 2020, 17:38

Dear Intocl

Thank you for your question

I’m not sure that I understand your question. The question needs some context.

Without any context, I believe that, in general, a node would send out a single prepare rumor, which is broadcast by its nature. But it’s still a single rumor. Of course, if x node want to name a file, we’d have x rumors going around. But each node would send a single rumor out in the simple case. So I believe watcherOut would be 1, but watcherIn could be up to numParticipants.

Does this answer your concern?

Regards
TAs

Re: Consistency of test 4; retransmission of rumors with prepares

by Francesco Intoci - Monday, 30 November 2020, 18:58

Yes, you are right, but when broadcasting the message I will send the rumor message (that is indeed always the same) and the address (that will eventually change, since I will send it to all my neighbour nodes) to the watcher. Otherwise, which address should I pass to the watcher when wrapping the GossipPacket in the CallbackMessage?

Re: Consistency of test 4; retransmission of rumors with prepares

by Aaron Joos Lippeveldts - Saturday, 28 November 2020, 17:18

I have also noticed that the reference gossiper only seems to send extra once.

In test 3 integration, when my nodes start they already missed some of node A's rumors. When they then receive a message with extra, they drop it and request the older rumors. By the time they get to the one with extra again there is no extra anymore and nothing happens in the end.

Re: Consistency of test 4; retransmission of rumors with prepares

by Pasindu Nivanthaka Tennage - Tuesday, 1 December 2020, 10:31

Hello

Yes, there seems to be an error in the reference implementation. We will fix that and push the changes to student repos today.

Regards

TAs

Re: Consistency of test 4; retransmission of rumors with prepares

by Pasindu Nivanthaka Tennage - Thursday, 3 December 2020, 18:59

Hello

Yes, there is an issue with the Rumor Processing. We are investigating that, and an updated version will be pushed to your repo soon.

Sorry about that

TAs

Re: Consistency of test 4; retransmission of rumors with prepares

by Pasindu Nivanthaka Tennage - Monday, 30 November 2020, 17:24

Thanks for pointing this out. Indeed you are correct. We modified this test and it will be pushed to your repo soon.

What we changed is: now Node E is also down, so that E will not exchange status messages.

For the other cases of unacknowledged rumur re-transmissions, In handout 2 we specified a timeout of 10 seconds to re-send messages not acknowledged, therefore, it should not happen in this test, as we’re waiting a total of 3 seconds

Hope this helps