CS-431: 2018 exam solutions

As mentioned/sketched in the QAS, in 2018 we had 2 different occurrences of the course (Spring and Fall) [[ and changed the course from a 6ECTS to a 4ECTS ]]; you are referring to **spring** 18 I guess.

question VI.6 and VI.13 are about using the **full** CYK chart to do lexical initialization of multi-token words: simply put the corresponding non terminal in the corresponding cell (which is on a row above the first).
For instance for the word "satellite antennas",which is a noun (N) made of two tokens, add the non-terminal N to the **second** row (and same column as "satellite").
question VI.9 is about comparing taggings of sentences of different length (due to different tokenization, more precisely to multi-token words, the tokens of which can also be (single token) words):
in such a case the probabilities can **not** be compared: it's not the same probabilistic space. More clearly (back to the fundamentals of first lecture about probabilistic tagging): P(tags|words) = P(words|tags) * P(tags) / P(words) is not divided by the same P(words) (since "words" are changing due to tokenization) ; thus it does not make any sense to compare the numerators only (which is what HMM compute)
question VI.11 makes some relationship between HMM probabilization and SCFG probabilization (basically you could rewrite HMM as a SCFG since HMM are nothing else but a probabilized regular language and any regular language is a CFG)
question VI.14 what's your precise question there? (maybe post your answer)
question VI.15 is about complexity: complexity of Viterbi algorithm (HMM) is linear (w.r.t size of the input sentence) whereas CYK is cubic
regarding question V.5: I also need a more precise question here, but I would say: lowering case, normalizing whitespaces, URLs and usernames make sense in this context, removing punctuation and adding gender does not a all and removing hash signs is unclear and has pros and cons (to be discussed)
question III.3: sure: make use of the chart, benefit from question 1 and proceed similarly to factorize the probabilities (no need to compute everything, just compute the parts that make a difference among choices), furthermore, make use of the fact that one of the "bottom" proba (NP-->process) is very small (thus are all parse trees making use of that derivation)

Questions & Discussions (about the course or NLP in general)

2018 exam solutions

Re: 2018 exam solutions