bayesian

bayesian filter to mark duplicate items

Hi, I collect news for certain topics and then run bayesian classfier on them to mark them as interesting or non-interesting. I see that there are news which are different articles are essentially the same news. e.g. - Ben Kingsley visits Taj Mahal with wife - Kingsley romances wife in Taj's lawns How do I teach the system to mark all ...

Kim and Pearl's Message passing algorithm in Bayesian Network

Hello! Can you give me a good link/resource where i could find a good implementation of Bayesian network ,I'm specially interested in Conditional Probability Table generation and how to pass messages/update nodes . Thanks! ...

Persistence on Java CI-Bayes object

Has anyone ever persisted a training set for CI-Bayes? I have sample code from this site: http://www.theserverside.com/news/thread.tss?thread%5Fid=49773 here is the code: FisherClassifier fc=new FisherClassifierImpl(); fc.train("The quick brown fox jumps over the lazy dog's tail","good"); fc.train("Make money fast!", "bad"); String c...

Understanding Bayes' Theorem

I'm working on an implementation of A Naive Bayes Classifier. Programming Collective Intelligence introduces this subject by describing Bayes Theorem as: Pr(A | B) = Pr(B | A) x Pr(A)/Pr(B) As well as a specific example relevant to document classification: Pr(Category | Document) = Pr(Document | Category) x Pr(Category) / Pr(Document...

Calculating Mutual Information For Selecting a Training Set in Java

Scenario I am attempting to implement supervised learning over a data set within a Java GUI application. The user will be given a list of items or 'reports' to inspect and will label them based on a set of available labels. Once the supervised learning is complete, the labelled instances will then be given to a learning algorithm. Thi...

Building a NetHack bot: is Bayesian Analysis a good strategy?

A friend of mine is beginning to build a NetHack bot (a bot that plays the Roguelike game: NetHack). There is a very good working bot for the similar game Angband, but it works partially because of the ease in going back to the town and always being able to scum low levels to gain items. In NetHack, the problem is much more difficult, ...

Designing bayesian networks

I have a basic question about Bayesian networks. Let's assume we have an engine, that with 1/3 probability can stop working. I'll call this variable ENGINE. If it stops working, then your car doesn't work. If the engine is working, then your car will work 99% of the time. I'll call this one CAR. Now, if your car is old(OLD), instead of...

blindly classifying new trends in incoming data

how do news outlets like google news automatically classify and rank documents about emerging topics, like "obama's 2011 budget"? i've got a pile of articles tagged with baseball data like player names and relevance to the article (thanks, opencalais), and would love to create a google news-style interface that ranks and displays new po...

Bayesian filtering for forum posts

Has anyone used a Bayesian filter to let forum members classify posts and so over time only display interesting posts? A Bayesian filter seems to work well for detecting email spam. Is this a viable approach to filter forum posts for users? ...

Loopy Belief Propagation code example

Does anybody know of a working code example of the sum-product algorithm for (loopy) belief for Bayesian Networks? I have scoured the earth for a couple days but haven't had much luck. I'm indifferent to which language it is in. All the documents I have found on the topic are full of arcane and absurdly ambiguous mathspeak. It doesn't s...

Detecting unknown class in a bayes classifier

If you have a bayes classifier trained for a set of classes, how to detect if the output is significant enough to choose a class? It would be useful for detecting samples wich can't be asigned to a class. I have tried testing if the class probability is above mean+2*stddev of the probabilities of all the clases, but I don't think it will...

ClassNotFoundException error in implementing Bayesian algorithm in Apache Mahout on Hadoop

Hi, I have a problem in executing the Bayesian algorithm in Mahout. I built it with Maven and the job file is in target directory. When run from terminal using hadoop, I'm getting the ClassNotFoundException error. What should be done? $HADOOP_HOME/bin/hadoop jar mahout-core-0.3-SNAPSHOT.job org.apache.mahout.classifier.bayes.mapre...

Weighted Average and Ratings

Maths isn't my strong point and I'm at a loss here. Basically, all I need is a simple formula that will give a weighted rating on a scale of 1 to 5. If there are very few votes, they carry less influence and the rating pressess more towards the average (in this case I want it to be 3, not the average of all other ratings). I've tried a...

Any Naive Bayesian Classifier in python?

Dear Everyone I have tried the Orange Framework for Naive Bayesian classification. The methods are extremely unintuitive, and the documentation is extremely unorganized. Does anyone here have another framework to recommend? I use mostly NaiveBayesian for now. I was thinking of using nltk's NaiveClassification but then they don't think t...

How do I solve this conditional probabilities problem with MATLAB?

If P( cj | xi ) are already known, where i=1,2,...n; j=1,2,...k; How do I calculate/estimate: P( cj | xl , xm , xn ), where j=1,2,...k; l,m,n {1,2,...n} ? ...

naive bayesian spam filter question

Hi guys, I am planning to implement spam filter using Naive Bayesian classification model. Online I see a lot of info on Naive Bayesian classification, but the problem is its a lot of mathematical stuff, than clearly stating how its done. And the problem is I am more of a programmer than a mathematician (yes I had learnt Probability a...

Ticket Bayesian(or something else) Categorization

Hi. I search solution for ticket managment system. Do you know any commercial offers? For now I have only own dev prjects with using dspam library. Maybe I am wrong use it but it show bad results. My idea was divide all prerated ticket in 2 group: spam (it is my category) and rest to (ham - all not the same with this category). After...

Naive Bayesian for Topic detection using "Bag of Words" approach

I am trying to implement a naive bayseian approach to find the topic of a given document or stream of words. Is there are Naive Bayesian approach that i might be able to look up for this ? Also, i am trying to improve my dictionary as i go along. Initially, i have a bunch of words that map to a topics (hard-coded). Depending on the occ...

Naive Bayesian classification (spam filtering) - Doubt in one calculation? Which one is right? Plz clarify

Hi guys, I am implementing Naive Bayesian classifier for spam filtering. I have doubt on some calculation. Please clarify me what to do. Here is my question. In this method, you have to calculate P(S|W) -> Probability that Message is spam given word W occurs in it. P(W|S) -> Probability that word W occurs in a spam message. P(W...

Calculate posterior distribution of unknown mis-classification with PRTools in MATLAB

I'm using the PRTools MATLAB library to train some classifiers, generating test data and testing the classifiers. I have the following details: N: Total # of test examples k: # of mis-classification for each classifier and class I want to do: Calculate and plot Bayesian posterior distributions of the unknown probabilities of mis...