views:

10

answers:

1

Hi all,

now I have a seemingly easy but challenging task.I need to develop a data set of questions,and I classify the questions into two categories:

  1. Factoid questions: "who is the current president of France."
  2. Free questions: "Can you rate the cameras below for me,please?"

now I need to know the percentage of both categories on Yahoo! answer so that I could maintain my data set accordingly,but I don't know a good way of doing this statistic.Doing manually seems really impossible,does anyone have an idea?I would be really grateful,thanks.

+1  A: 

You mean, recognize one from the other? Automatically, without any categorization from the site's end? That's probably going to be impossible.

I think the best you can do is compare some metrics. "Free" questions will probably tend to have more contributions with more text; they would be more heavily discussed if Y!Answers had a discussion system... "Factoid" questions may start with "What is..." more often ... and so on.

Maybe fetch 100 random questions, do a manual check and write down the percentages.

Pekka
Great,I actually was thinking of more text questions would be classified as Free Question
Robert
@Robert yeah, but it's never going to be entirely reliable. Manual research on an example data set is probably your best bet
Pekka