I was looking at the solution to the exercise and I have a question about part one:
The frequency distribution before lemmatization is generated like this:
fdist_before = nltk.FreqDist(word.lower() for (word, tag) in brown_tagged)
which yields 49815 distinct words in the corpus. That's fair enough.
Now, the distribution after lemmatization is generated like so:
fdist_after = nltk.FreqDist(lem.lemmatize(word, get_wordnet_pos(tag)).lower() for (word, tag) in brown_tagged)
lem.lemmatize(word.lower(), get_wordnet_pos(tag)) for (word, tag) in brown_tagged