CS-438: A Few Problems Encountered When Testing

Tests on DSDV, like TestGossiper_Topo1_5Nodes_DSDV1, may not give enough time for anti entropy to be triggered. This may lead to incomplete routing tables. In TestGossiper_Topo1_5Nodes_DSDV1, I need to change `antiEntropy` to 3 to make the test pass.

TestBinGossiper_Topo1_2Nodes_Rumor assumes that the expected rumor is the first one received. But when I run the test, the nodes may receive rumors originated from themselves first as a consequence of the rumor mongering process, which makes the test fail. I filter these rumors to make the test pass.

Re: A Few Problems Encountered When Testing

by Cristina Basescu - Monday, 19 October 2020, 10:24

Hi,

There is no error in these tests, and you shouldn't change the tests for them to pass. My explanations are below.

Tests on DSDV, like TestGossiper_Topo1_5Nodes_DSDV1, may not give enough time for anti-entropy to be triggered.

Like the one you mentioned, some tests verify that you build routing tables based on message rumors only, without relying on anti-entropy. This is why the test deadline of 5 seconds is smaller than the anti-entropy parameter of 10 seconds. Other than debugging reasons, please do not change the anti-entropy parameter because you're changing the behavior we intended to test.

Gossipers should not receive back their own message. The reason is, a gossiper does not add itself to its status table, which means, by definition, that it cannot receive back its own messages. Recall that, when processing status messages, gossipers send each other messages only for ids seen in the status message.

Cristina

Re: A Few Problems Encountered When Testing

by Jinyi Xian - Monday, 19 October 2020, 12:04

Gossipers receiving back their own messages is not caused by status messages.

From the homework instruction:

the peer picks a random receiver peer R (from the list of all known peers, which includes peers given at bootstrap, as well as peers this node received messages from)

For TestBinGossiper_Topo1_2Nodes_Rumor, there are two nodes only and that's the reason why nodes receive their own messages.

I wonder if I should exclude the peer that the node has just received message from.

Re: A Few Problems Encountered When Testing

by Jinyi Xian - Monday, 19 October 2020, 12:56

Also, if we consider a loop, A -> B -> C -> A. In this case, it is likely that A receives its own rumours from C. Therefore, I think it should be ok for gossipers to receive their own rumours

Re: A Few Problems Encountered When Testing

by Cristina Basescu - Monday, 19 October 2020, 14:40

You're right; in the case of a network that contains loops, it's ok for A to receive its own messages back.

However, the test network that this forum post discusses does not contain loops (other than neighbors that know each other), so the node shouldn't receive back its own message.

Cristina

Re: A Few Problems Encountered When Testing

by Peter Krcmar - Monday, 19 October 2020, 16:56

Quick follow-up question on receiving one's own rumor: I'm having issues with star topologies (e.g. the TestBinGossiper_DSDV tests). Simplest version is: A is in the middle and there are 2 other nodes (H, I) connected to A, so all messages flow through A.
If H sends a rumor to A, it will be forwarded to I. I sends an ack and since A is in sync with I, it can resend it to another peer (if successful coin flip). It chooses to send to a different peer than I, and the only available one is H, so H gets back it's own rumor.
I noticed that the reference nodes don't have this issue, so I was wondering: should we keep track of the original senders address as well as all peer's addresses we sent the rumor to, such that when we resend it after a coin flip we exclude both the original source and all of the peers we already sent it to, and not only the most recent one ?
Thanks a lot!

Re: A Few Problems Encountered When Testing

by Cristina Basescu - Monday, 19 October 2020, 14:33

I wonder if I should exclude the peer that the node has just received message from.

Yes, the peer should not rumor-monger a message back to the peer that has just sent it the message. This behavior is in line with homework 0; we'll make an announcement about it.

Cristina

Re: A Few Problems Encountered When Testing

by Jinyi Xian - Monday, 19 October 2020, 14:49

Thank you very much

Re: A Few Problems Encountered When Testing

by Jinyi Xian - Monday, 19 October 2020, 12:26

I think the problem for TestGossiper_Topo1_5Nodes_DSDV1 is that the result is not deterministic.

We can have the situation that A does not receive rumors from B. B may send his rumor to D, relay D's to E and relay E' to D. A may relay C's rumor back to C. In this scenario, I don't think A's routing table can contain an entry of B without anti entropy. So it relates to the problem I raise above whether I should exclude the peer which the message is from when I randomly select from the peer list.

Re: A Few Problems Encountered When Testing

by Cristina Basescu - Monday, 19 October 2020, 14:45

Hopefully, my answer above clarifies the issue. Answering here just for completeness.

If you consider that nodes never send a rumor back to the peer they've just received it from, then "A may relay C's rumor back to C" would never happen. Hence, the test is deterministic.

Re: A Few Problems Encountered When Testing

by Peter Krcmar - Monday, 19 October 2020, 12:56

Hi, just to make sure because I'm getting kinda confused: you say that a gossiper does not add itself to its status table, but the handout states:

Status message: Summarizes the set of messages that the sending peer has seen so far from other peers and the messages that the peer itself has sent

So if I understood correctly, if a gossiper has sent at least one rumor it should include itself in all statuses it sends out right ? For example if A sends one rumor, it's next status should contain: peer A nextID 2 ?

Thanks !

Re: A Few Problems Encountered When Testing

by Cristina Basescu - Monday, 19 October 2020, 14:31

Hi,

I see why this is confusing and perhaps we haven't chosen the best phrasing. Thanks for pointing this out.

Indeed, the handouts contain somewhat contradictory statements about whether a peer should add itself to the status msg or not. The first one, as you pointed out, says “Status message: Summarizes the set of messages that the sending peer has seen so far from other peers and the messages that the peer itself has sent” and the second one says “The status message contains only one field: essentially a vector clock with other nodes’ IDs (origin IDs) that the peer knows about”.

Please, use the second statement for the implementation, namely that a node adds in the status message the other nodes’ IDs (origin IDs) that the peer knows about, i.e., it does not add itself. The first statement was meant as a general description of what status messages are and less of an implementation instruction.

Sorry if this has caused issues for you.

Cristina

Re: A Few Problems Encountered When Testing

by Aaron Joos Lippeveldts - Monday, 19 October 2020, 15:46

Hi, I'm also a bit confused about this.

Why would the node not advertise its own messages in the vector clock?
That seems like a very weird decision.
Suppose a new node B joins the network and sends a message to node A.
Node A then responds with a status message for confirmation, but the status does not contain anything about A.
This way B has no way of knowing about the messages A has sent before, and might not send its status message back
to A to request synchronization.

Why would this be desired behaviour? Am I not seeing it correctly?

Re: A Few Problems Encountered When Testing

by Cristina Basescu - Monday, 19 October 2020, 17:22

Hi,

The pragmatic answer is that it's not a decision that critically affects performance. Adding a node's own id to the status message would certainly not be wrong (and, in fact, might have been simpler to explain in this homework). However, ultimately, anti-entropy is the parameter that manages fault tolerance in a scalable manner and ensures message broadcast with high probability.

In your example, you're right that B would not learn about A's messages in this first exchange by looking at A's status. But as soon as B sends a status message to A (due to anti-entropy or as a rumor reply), B includes A's id in the status and, as a result, A sends B all the missing messages with origin A. Keep in mind that rumors and statuses are sent unreliably and could be lost - even if A included its own id in the status packet.

Hope this helps,

Cristina

Re: A Few Problems Encountered When Testing

by Peter Krcmar - Monday, 19 October 2020, 19:20

Hi again, sorry for so many questions :/
I understand the reasoning and I adapted my code to not include the id of the peer sending the status. The tests using only my gossipers seem to work, however when running a couple tests with the reference nodes I noticed all statuses from reference nodes contained their own id's.
So I ran 3 variants of one of the tests:
- A: only reference nodes
- B: one of my nodes with own id in status
- C: one of my nodes but without own in status
Results: A and B are very similar and are > 150 lines of output. C sometimes fails and generates thousands of lines, so just wanted to make sure if maybe we have the correct binaries!

Re: A Few Problems Encountered When Testing

by Reka Inovan - Monday, 19 October 2020, 19:43

I also observed the same problem. A simple example of the problem is like this; consider two nodes A and B.
1. Lets assume that A receives 1 client message, it shares the rumor with B. Then B sends a status back to A. So far there is no problem.
2. Now, lets assumes B receives 1 client message, it shares the rumor with A. But the status that A sends back to B does not contain identifier for A.
3. Because of the status it received, B will thought that A doesn't have the message from node A. So it will sends a rumor from A back to A.
4. A will reply the message from step 3 with a status. But because the status of A does not contain the identifier from A, we basically stuck in a loop.

This leads to thousand lines that we observed. Like you also mentioned the reference implementation actually includes its own identifier when replying to a rumor. This can be clearly seen in testcase TestBinGossiper_Topo1_2Nodes_Rumor.

I understand that the homework specification itself its a bit contradictory and maybe the "other node" part is just a typo. I think that there's not a big harm if the node also include its own identifier in the status. The issue raised by the original poster can be simply solved by a better criteria for updating the route. Is it okay if we just continue including the node own identifier on its status?

Re: A Few Problems Encountered When Testing

by Cristina Basescu - Monday, 19 October 2020, 20:30

Please see https://moodlearchive.epfl.ch/2020-2021/mod/forum/discuss.php?d=45114