spam-filtering

Spam filtering for ASP.NET

Has anybody come across an elegant and efficient .NET spam filtering component that can be used in ASP.NET applications? I want something that I can pass a string and returns to me a percentage of how likely it could be spam, does such a thing exist? ...

Naive Bayesian spam filtering effectiveness

How effective is naive Bayesian filtering for filtering spam? I heard that spammers easily bypass them by stuffing extra non-spam-related words. What programming techniques can you use with Bayesian filters to prevent that? ...

What's the most straightforward way to delete emails marked as spam by SpamAssassin?

I'm on Ubuntu Intrepid, using Postfix and SpamAssassin. I've seen approaches using procmail (like the one suggested @ Apache), but I'm looking for a solution that does not use procmail. This is a programming question because the correct answer will be some form of code that accomplishes the task at hand (my response to the negative vote...

Bayesian Network for Spam Filtering

Hello, I want to use Bayesian Network mechanism for spam filtering. How do you think it should look a proper topology of the network? What about naive Bayes model? (The naive Bayes model is sometimes called a Bayesian classifier) ...

Calculating the probability of a token being spam in a Bayesian spam filter

I recently wrote a Bayesian spam filter, I used Paul Graham's article Plan for Spam and an implementation of it in C# I found on codeproject as references to create my own filter. I just noticed that the implementation on CodeProject uses the total number of unique tokens in calculating the probability of a token being spam (e.g. if the...

Neural networks for email spam detection

Let's say you have access to an email account with the history of received emails from the last years (~10k emails) classified into 2 groups genuine email spam How would you approach the task of creating a neural network solution that could be used for spam detection - basically classifying any email either as spam or not spam? Let'...

In a system for moderating offensive user content, how do you decide on a threshold for automatic removal?

I'm writing a forum application and want to automatically moderate offensive posts using a flagging system similar to the one on StackOverflow in which users are given the ability to report problematic posts as falling into one of three categories: * Abusive speech * Off topic * Spam If a particular post receives a certain number of f...

Virus / Spam Scanner Programatically for .NET ????

I am writing a piece of messaging software that will send and receive text, voice, fax etc. via SMTP (email). I need to Programatically have the ability to scan incoming and outgoing emails for viruses, spam etc. QUESTION: Can anyone offer a suggestion on a product to use for this? I tend to stay away from the consumer level software (...

Blacklist of words on content to filter message.

Hi, For a website that takes input from kids we need to filter any naughty / bad words that they use when they enter their comments in the website (running PHP). The comments are a free field and users can enter whatever comments they want. The solution I can think of is to have a words list like BLACKLIST: bad,bad,word,woord,craap,cr...

Out of the box spam filtering?

I work on a social media monitoring system. We don't crawl the web ourselves, we get feeds from aggregators like Spinn3r. In most cases, the "blogs" that are nothing but pages of links to porn sites are filtered, but we'd like something in-house that we can train on a quicker time frame than waiting for upstream providers to make changes...

Disallowing proxies from POSTing

Hi there! I want to disallow proxies and spambots from posting in my website. What is the best way to do so? I've downloaded a blacklist and my first idea was to disable each of ips in my .htaccess file, but after downloading the list, I found out that it contained almost 9 million entries. My other idea was to split each IP in 4 par...

Email Spam Filtering at the Code Level in Java

I'm writing code to download email from various servers, some of which are outside of my control. I'd like to be able to filter out spam at the code level since I can't always rely on the servers to do it effectively. What resources in Java are available to help with this? What is a good approach to take in order to minimize the amount o...

How does MSN filter spam?

I am trying to create a newsletter for our business. The last few days have been spent testing, and one of things I have noticed is that MSN seemingly randomly filters out some of my test messages. This is super-frustrating. I like the PEAR Mail MIME-package, and have been using that. I may send one email from one of our servers, resu...

anti smap/scam filter for instant messaging

Hi, I am interesting any kind of anti spam/scam fiter for instant messages (even commercial solutions) in my case jabber server in dating area. I found open source project dscam, but cannot be sure that is best solution. Any help is welcome. Thanks. ...

Best solution to anti-spam in PHP?

How to distinguish robots from normal user? How does SO do this job? Currently I'm met with a robot which post once every 1 hour... ...

naive bayesian spam filter question

Hi guys, I am planning to implement spam filter using Naive Bayesian classification model. Online I see a lot of info on Naive Bayesian classification, but the problem is its a lot of mathematical stuff, than clearly stating how its done. And the problem is I am more of a programmer than a mathematician (yes I had learnt Probability a...

Problem with Precision floating point operation in C

Hi Guys, For one of my course project I started implementing "Naive Bayesian classifier" in C. My project is to implement a document classifier application (especially Spam) using huge training data. Now I have problem implementing the algorithm because of the limitations in the C's datatype. ( Algorithm I am using is given here, htt...

What would be a good language to implement a naive bayes classifier from scratch?

I would like to implement a naive bayes classifier for spam filtering from scratch as a learning exercise. What would be the best langauge of the following to try this out in? Java Ruby C++ C something else Please give reasons (it would help greatly!) ...

Naive Bayesian classification (spam filtering) - Doubt in one calculation? Which one is right? Plz clarify

Hi guys, I am implementing Naive Bayesian classifier for spam filtering. I have doubt on some calculation. Please clarify me what to do. Here is my question. In this method, you have to calculate P(S|W) -> Probability that Message is spam given word W occurs in it. P(W|S) -> Probability that word W occurs in a spam message. P(W...

List of bank domains to stop phish

I'm looking for a full list of bank domain names to include in an anti-phish routine I'm planning, does anyone have a list/URL? ...