I have recently been inspired to write spam filters in JavaScript, Greasemonkey-style, for several websites I use that are prone to spam (especially in comments). When considering my options about how to go about this, I realize I have several options, each with pros/cons. My goal for this question is to expand on this list I have created, and hopefully determine the best way of client-side spam filtering with JavaScript.
As for what makes a spam filter the "best", I would say these are the criteria:
- Most accurate
 - Least vulnerable to attacks
 - Fastest
 - Most transparent
 
Also, please note that I am trying to filter content that already exists on websites that aren't mine, using Greasemonkey Userscripts. In other words, I can't prevent spam; I can only filter it.
Here is my attempt, so far, to compile a list of the various methods along with their shortcomings and benefits:
Rule-based filters:
What it does: "Grades" a message by assigning a point value to different criteria (i.e. all uppercase, all non-alphanumeric, etc.) Depending on the score, the message is discarded or kept.
Benefits:
- Easy to implement
 - Mostly transparent
 
Shortcomings:
- Transparent- it's usually easy to reverse engineer the code to discover the rules, and thereby craft messages which won't be picked up
 - Hard to balance point values (false positives)
 - Can be slow; multiple rules have to be executed on each message, a lot of times using regular expressions
 - In a client-side environment, server interaction or user interaction is required to update the rules
 
Bayesian filtering:
What it does: Analyzes word frequency (or trigram frequency) and compares it against the data it has been trained with.
Benefits:
- No need to craft rules
 - Fast (relatively)
 - Tougher to reverse engineer
 
Shortcomings:
- Requires training to be effective
 - Trained data must still be accessible to JavaScript; usually in the form of human-readable JSON, XML, or flat file
 - Data set can get pretty large
 - Poorly designed filters are easy to confuse with a good helping of common words to lower the spamacity rating
 - Words that haven't been seen before can't be accurately classified; sometimes resulting in incorrect classification of entire message
 - In a client-side environment, server interaction or user interaction is required to update the rules
 
Bayesian filtering- server-side:
What it does: Applies Bayesian filtering server side by submitting each message to a remote server for analysis.
Benefits:
- All the benefits of regular Bayesian filtering
 - Training data is not revealed to users/reverse engineers
 
Shortcomings:
- Heavy traffic
 - Still vulnerable to uncommon words
 - Still vulnerable to adding common words to decrease spamacity
 - The service itself may be abused
 - To train the classifier, it may be desirable to allow users to submit spam samples for training. Attackers may abuse this service
 
Blacklisting:
What it does: Applies a set of criteria to a message or some attribute of it. If one or more (or a specific number of) criteria match, the message is rejected. A lot like rule-based filtering, so see its description for details.
CAPTCHAs, and the like:
Not feasible for this type of application. I am trying to apply these methods to sites that already exist. Greasemonkey will be used to do this; I can't start requiring CAPTCHAs in places that they weren't before someone installed my script.
Can anyone help me fill in the blanks? Thank you,