I am working on building a Question Classification/Answering corpus as a part of my masters thesis. I'm looking at evaluating my expected answer type taxonomy with respect to inter-rater agreement/reliability, and I was wondering: Does anybody know of any decent (preferably free) Java API(s) that can do this?
I'm reasonably certain all I need is Fleiss' Kappa and Krippendorff's Alpha at this point.
Weka provides a kappa statistic in it's evaluation package, but I think it can only evaluate a classifier and I'm not at that stage yet (because I'm still building the data set and classes).
Thanks.