need some suggestions on my SVM feature refinement

views:

answers:

need some suggestions on my SVM feature refinement

Hello all,

I've trained a system on SVM,that is given a question,whether the webpage is a good one for answering this question.

The feature I selected are "Term frequency in webpage","Whether term matches with the webpage title", "number of images in the webpage", "length of the webpage","is it a wikipedia page?","the position of this webpage in the list returned by the search engine".

Currently,my system will maintain a precision around 0.4 and recall at 1.It has a large portion of false positive error(that many bad links were classified as good link by my classifier).

Since the accuracy could be improved a bit,I would like to ask for some help here on considering refine the features that I selected for training/testing,could remove some or adding more in there.

Thanks in advance.

+1 A:

Hmm...

How large is your training set? i.e., how many training documents are you using?
What is your test set composed of?
Since you're getting too many FPs, I would try training with more (and varied) "bad" webpages
Can you give more details about your different features, like "tf in webpage," etc.?

The Alchemist 2010-08-30 18:16:31

yes,thanks,the Term Frequency is the frequency of keywords appearing in the webpage.Such keywords are determined by me,manually,to take 2 or 3 most important and decisive keywords out of the original question,then calculate its frequency in the webpage.

Robert 2010-08-30 18:37:08

Well, without many more details, I can't help out much besides my original advice. You can probably come up with more features like: - number of words in the answer that is also in the related Wikipedia entry - complexity of answers (via a reading level calculator; this will probably only work well for very technical or scientific questions)Also, if you're using phrases as the basis of recommendations, you'll probably miss synonyms. If the question is about a doctor and the answer is about a *physician*, then it probably won't get caught. Somehow integrating WordNet may be worth it.

The Alchemist 2010-08-31 13:29:15

ansaurus

tags:

views:

answers:

need some suggestions on my SVM feature refinement

related questions