PoS exercise part 1

Re: PoS exercise part 1

par Jean-Cédric Chappelier,
Number of replies: 0
Very good question indeed!
(and you're right)

Actually, the corpus contains many proper nouns and thus uppercase should, in the most accurate processing, be taken into account.
However, to be able to really cope with it, we should either have (or even both)
  • distinct common noun/proper noun tags, which we don't in the "universal" tagset;
  • or end-of-sentence detector, which we did not introduced.
So we decided, as a simplification assumption, not to take this difference (upper/lower-case) into account.
Once this has been decided, then, you're right, we shall be consistent and stick to it, exactly the way you mention.

Thanks for pointing out!