Quiz 1 results

Quiz 1 results

by Marie Biolková -
Number of replies: 8

Good afternoon,

I noticed in the quiz review that I lost partial marks in a question that clearly specifies that we should only select one answer. I think it is not really fair to deduce marks if the question was formulated this way.

screenshot

Similarly, I am confused with the classification of a good Kappa. I said that we can deal with kappa = 0.31 because it's not negative (which would mean disagreement). It's just a bit frustrating since we never really drew the line between can deal with it / bad during the lectures.

Marie

In reply to Marie Biolková

Re: Quiz 1 results

by Jean-Cédric Chappelier -

Well... you didn't LOOSE partial mark, but you GOT partial mark.
Would you prefer to have 0 instead? (the expected answer was the first one, see the solution PDF).

Regarding the kappa :

See slide 26/58 of the lecture on evaluation.
For corpus annotation in order to make a referential, less than 0.6 starts to be really bad (remember that the goal is here to create a referential, a gold standard, not to compare political opinion for instance).
Notice also that negative value means correlated disagreement, which is yet another story in the framework of corpus annotation (certainly means another problem).
I hope this clarifies.

In reply to Jean-Cédric Chappelier

Re: Quiz 1 results

by Ilija Gjorgjiev -
Hi, I am also confused regarding the question in which we are asked to qualify the quality of the annotations? You say that the correct answer is "bad" but this is quite an ambiguous answer. According to the slides
1 is considered a "perfect agreement", while 0 being statistical independence, Positive: better than chance, Negative: worse than chance (correlated disagreement), more than 0.6 is usually considered ok, and more than 0.8 considered good. According to the answers given in the quiz we had "very good", "good", "can deal with it" and "bad". Now these answers themselves are quite ambiguous, considering the fact that only one of them is mentioned in the slides("good"). 

I think it is quite unfair to have this type of question be wrong if we answered "can deal with it" since the answers "bad" and "can deal with it" can be interpreted differently.  My reasoning was that if cohen kappa is positive and it is below 0.6 then it is certainly better than by a chance agreement as described in slides, thus answered "can deal with it". So I think is right that we get full points on this question. I mean after all what chance should we have that we can say  "can deal with it" is the correct answer? 
In reply to Ilija Gjorgjiev

Re: Quiz 1 results

by Jean-Cédric Chappelier -

don't forget why we are measuring this kappa: to create a referential, a gold standard. If kappa is too low, even if it's positive, this means the task is too ambiguous even for humans and the "gold standard" resulting from such an annotation is really questionable.
That's the main point we'd like you to understand.

Now, strictly speaking, the boundary is written in the slide:
more than 0.6 is usually considered ok, and more than 0.8 considered good
This is a clear boundary: above 0.8 it's good, between 0.6 and 0.8: "usually considered ok" (which, to me, is "can deal with it") and by inference, below 0.6 is NOT ok.

In reply to Jean-Cédric Chappelier

Re: Quiz 1 results

by Ilija Gjorgjiev -
Yes, I understand that, but we are also discussing levels as numbers defined in the slides.  In the slides, multiple levels are being defined differently, the definition is based on numerations with corresponding levels. Also if everything below 0.6 is defined as "bad" than what is:
1.  "can deal with it" is it "ok"?
2. what is "very "good then, is it a "perfect agreement",
3. Why did we had all these levels in the slides, and why are they defined with different phrases on the quiz then, what was the point?

my point here is that all this is ambiguous to people and even though I understand your answer, my interpretation as described before also makes sense. If "can deal with it" was defined as an "ok" answer then I would have certainly not answered "ok" but I interpreted "can deal with it" in a different way thus it was not the same as  "ok" in my mind for example. 

I think it would be nice if the next quiz we have more clear answers on questions like this because this creates confusion. I think the main point should be that we are graded on our knowledge and not on our interpretation of tricky and ambiguous words/phrases since everybody may have different interpretations of some phrases, especially when defined differently than in the lecture.

Thank you for your answer.
In reply to Ilija Gjorgjiev

Re: Quiz 1 results

by Jean-Cédric Chappelier -

I understand your point of view but do not agree. Let me answer the two main points of your message (without willing to argue, simply justify).

First of all, the point of this very question is precisely to test whether you _know_ (or not) how to use the kappa measure in the context of annotating a corpus to make it a gold-standard.

If as an intern (or worth as a data-engineer) you end up with a kappa of 0.31 in such a situation (annotating a corpus to make it a gold-standard) and you tell your supervisor (or your boss) that "we can deal with it", well..., to me you should be assigned to the photocopies for the rest of your intern (or be fired). As simple as this.

So to me, you didn't get the proper knowledge with respect to this point (usage of kappa for gold standards).
Maybe there is some ambiguity in the lesson/slide, and we, teachers, shall improve on that, but to me there is no confusion in the quiz question itself _with this respect_.

Regarding your second point, I wouldn't call "knowledge" the capacity of reading a slide and sticking to it. The "real life" won't stick to the definitions of our poor slides.
To me, knowledge embraces the deep understanding of the *context*, of the pros and cons and their implications, etc.

Sure it's a problem of interpretation, but it always will be [questions are written in Natural Language after all], and much broadly than an simple quiz question. It's precisely the role of an engineer to deal with _contexts_, errors, constraints, etc.
(and, again, to us there was (and is) no confusion in that question with respect to its context and our objectives)



In reply to Jean-Cédric Chappelier

Re: Quiz 1 results

by Jean-Cédric Chappelier -

> How do you come up with amazing punch lines like the following?
> "If as an intern (or worth as a data-engineer) you end up with a kappa of 0.31 in such a situation
> (annotating a corpus to make it a gold-standard) and you tell your supervisor (or your boss)
> that "we can deal with it", well..., to me you should be assigned to the photocopies for
> the rest of your intern (or be fired). As simple as this."
> It's truly a work of art.

  1. Now you remember that IAA, kappa and all this stuff and that corpus quality shall indeed be crucial for a data-scientist; :-)
    this is my only concern (the fact that you remember it);
  2. Sure it lacks a few smileys. You miss my body-language (smile, eyes, etc.)
    Short written expression media (typically like "social" networks) do indeed much more damage than they should...
  3. Anyway, even if the style is clumsy, the content remains true.