Bayesian filtering for spam
I was wondering if there is any good and clean oo implementation of bayesian filtering for spam and text classification? For learning purposes....
I was wondering if there is any good and clean oo implementation of bayesian filtering for spam and text classification? For learning purposes....
I'm looking for a an R package which can be used to train a Dirichlet prior from counts data. I'm asking for a colleague who's using R, and don't use it myself, so I'm not too sure how to look for packages. It's a bit hard to search for, because "R" is such a nonspecific search string. There doesn't seem to be anything on CRAN, but ar...
How effective is naive Bayesian filtering for filtering spam? I heard that spammers easily bypass them by stuffing extra non-spam-related words. What programming techniques can you use with Bayesian filters to prevent that? ...
I've got a classification problem in my hand, which I'd like to address with a machine learning algorithm ( Bayes, or Markovian probably, the question is independent on the classifier to be used). Given a number of training instances, I'm looking for a way to measure the performance of an implemented classificator, with taking data overf...
This is a good one because it's so counter-intuitive: Imagine an urn filled with balls, two-thirds of which are of one color and one-third of which are of another. One individual has drawn 5 balls from the urn and found that 4 are red and 1 is white. Another individual has drawn 20 balls and found that 12 are red and 8 are white. Whi...
I just had a clever idea (I think). Suppose you wanted to estimate the size of a userbase of a site which does not publicize this information. People are more likely to have acquired different usernames with different probabilities. For instance, if the username 'nick' doesn't exist on the system, it's likely to have an extremely small...
In other answers at Stackoverflow it's been suggested that Weka is good, but there are others (Classifier4j, jBNC, Naiban). Does anyone have actual experience with these? ...
Is there a Bayesian filter library for .NET? I would like to setup a group of folders and have emails automatically moved to those folders based on what has been previously moved to the folder. If you are familiar with FogBugz auto-sort, that's exactly what I would like to do. ...
I have a large (~2.5M records) data base of image metadata. Each record represents an image and has a unique ID, a description field, a comma-separated list of keywords (say 20-30 keywords per image), and some other fields. There's no real database schema, and I have no way of knowing which keywords exists in the database without iterati...
I am looking for a Python library which does Bayesian Spam Filtering. I looked at SpamBayes and OpenBayes, but both seem to be unmaintained (I might be wrong). Can anyone suggest a good Python (or Clojure, Common Lisp, even Ruby) library which implements Bayesian Spam Filtering? Thanks in advance. Clarification: I am actually looking ...
A quick Google search reveals that there are a good number of Bayesian classifiers implemented as Python modules. If I want wrapped, high-level functionality similar to dbacl, which of those modules is right for me? Training % dbacl -l one sample1.txt % dbacl -l two sample2.txt Classification % dbacl -c one -c two sample3.txt -v one...
I recently wrote a Bayesian spam filter, I used Paul Graham's article Plan for Spam and an implementation of it in C# I found on codeproject as references to create my own filter. I just noticed that the implementation on CodeProject uses the total number of unique tokens in calculating the probability of a token being spam (e.g. if the...
I want to use naive bayes to classify documents into a relatively large number of classes. I'm looking to confirm whether an mention of an entity name in an article really is that entity, on the basis of whether that article is similar to articles where that entity has been correctly verified. Say, we find the text "General Motors" in a...
After asking this question, I've decided to try and implement some Naive Bayes Classifiers using SQL Server Analysis Services. Can anyone point me to a decent book, website or any other resource on how to implement Naive Bayes Classifiers in SSAS? Similarly, I would be interested in learning about Decision Trees. ...
I would like to implement a simple Bayesian classification system to do rudimentary sentiment analysis on short messages. Practical suggestions for implementing in Ruby would be welcome. Suggestions for other approaches besides Bayes would also be welcome. ...
It appears that the simplest, naivest way to do basic sentiment analysis is with a Bayesian classifier (confirmed by what I'm finding here on SO). Any counter-arguments or other suggestions? ...
How does Stackoverflow's homepage filtering work? I believe the questions that appear on the homepage are specifically related to your interests, which are indicated by the tags that you look at, question ans answer. Does anyone know the name of the algorithm/technique or have some basic details (nothing that violated their IP) about h...
I have some kind of object model and I need to filter and sort it's nodes for some kind of property. What kinds of automated systems exist to generate and select properties of the object model that correlate to what I want? (I'm intentionally being abstract and non-specific) I'm thinking of a system that works kind of like spam filters ...
Could anybody explain to me why simulatedCase <- rbinom(100,1,0.5) simDf <- data.frame(CASE = simulatedCase) posterior_m0 <<- MCMClogit(CASE ~ 1, data = simDf, b0 = 0, B0 = 1) always results in a MCMC acceptance ratio of 0? Any explanation would be greatly appreciated! ...
I'm thinking of writing an app to classify movies in an HTPC based on what the family members like. I don't know statistics or AI, but the stuff here looks very juicy. I wouldn't know where to start do. Here's what I want to accomplish: Compose a set of samples from each users likes, rating each sample attribute separately. For examp...