Hi,
I'm working on a project where I need to create a spam database and accept submissions from users. Accepting the submissions is easy, but I was trying to figure out how to weight these submissions.
Let's say the database consists of words, and i get the following submissions: * 137x "banana" * 22x "apple" * 1x "exploding mouse"
Now, there's a fairly good chance that "banana" is a spam word. "Apple" might be, but should probably be considered in a grey list, while "exploding mouse" is probably just a prank.
Anyone got any good ideas?
Cheers!