Quiz 1: Question 5

Quiz 1: Question 5

by Alessio Cappellato -
Number of replies: 3

Could you explain why when moving from the word model to the character model the separator characters are not considered in the count?

As an example: from a template sentence "xxxxxYxxxxxYxxxxxYxxxxx", where Y is any separator character, a 3-gram in the word model could be ("xxxxx", "xxxxx", "xxxxx"), then I thought that the associated n-gram in the character model would be ("x", "x", "x", "x", "x", "Y", "x", "x", "x", "x", "x", "Y", "x", "x", "x", "x", "x") with n=3*5+(3-1)=17, since the average word length is 5 characters; while a 15-gram in the character model would be either without the "Y"s (which I think is the case) or with the "Y"s but with less than 15 "x"s.

Since the correct answer was n=15, one of these two cases is the correct one, could You explain which one and why?

Thanks.

In reply to Alessio Cappellato

Re: Quiz 1: Question 5

by Jean-Cédric Chappelier -

This is a very good question! And it's related to what is seen as "object" (we deliberately added the quotes in the question), and that's why there were several possible answers.
The simplest being : 3 words \simeq  3 times 5 characters.
But indeed one could decide that the character-definition of a word includes its boundaries, then 17, but even more 19 (including also front separator and end separator) could be accepted...
...which was actually the case.

But since you asked the question, I wondered why and double checked: indeed you got only partial points (50%) for that answer which was a mistake on our side of the setting up in Moodle. 17 (and 19) are definitely worth 100% of the points. I fixed that.

In reply to Jean-Cédric Chappelier

Re: Quiz 1: Question 5

by Alessio Cappellato -

Thank You very much for the explanation (and the points), I understand!

In reply to Alessio Cappellato

Re: Quiz 1: Question 5

by Jean-Cédric Chappelier -

Actually, the purpose of these questions was to have you realize why word models exist, what would be the cost to handle all that at the character level only. That was the main reason of the questions 6 and 7: compare 100^15 (or 100^19) to 10^15(+10^8)...