document-classification

What tried and true algorithms for suggesting related articles are out there?

Hi, Pretty common situation, I'd wager. You have a blog or news site and you have plenty of articles or blags or whatever you call them, and you want to, at the bottom of each, suggest others that seem to be related. Let's assume very little metadata about each item. That is, no tags, categories. Treat as one big blob of text, includi...

Understanding Bayes' Theorem

I'm working on an implementation of A Naive Bayes Classifier. Programming Collective Intelligence introduces this subject by describing Bayes Theorem as: Pr(A | B) = Pr(B | A) x Pr(A)/Pr(B) As well as a specific example relevant to document classification: Pr(Category | Document) = Pr(Document | Category) x Pr(Category) / Pr(Document...

SVM Multiclass text classification

Hi I want to classfy News data set and training data are classified with IPTC subject code(Hierarchical classification). In my project I should use svm . I have done all of feature extraction ,stemming,removing stop word ... I almost have the file format required for svm multiclass : which is like: category feature:value feature:...

help with representing textual data in the format suitable for SVM's more specifically libsvm

Hi, My problem at hand is, I need to be able to classify agricultural web pages from not agricultural web pages. This is oriented towards building a focused crawler that only crawls and indexes mostly agricultural pages. I need advice from any person whose experienced with working with SVM's? Would considering the SVM classifier be appr...

Text classification/categorization algorithm

My objective is to [semi]automatically assign texts to different categories. There's a set of user defined categories and a set of texts for each category. The ideal algorithm should be able to learn from a human-defined classification and then classify new texts automatically. Can anybody suggest such an algorithm and perhaps .NET libr...

Looking for open source text-classfication implementation

There are some algorithms out there used for text classification, such as Bayes, kNN, SVM, etc. And I am looking for some implementations, any suggestions? ...