views:

74

answers:

4

I have to implement a "bad words" filter on my website, which is a classifieds website.

I have a big list of "bad words" but don't know which method is best to compare the user inputs to.

In my case, a textarea inside a form, needs to be checked for "bad words".

   <form name="test" action="test.php" method="post">

Inside test.php I fetch the textarea, and need to compare it...

My Q is, would you compare it to an external text-file with bad words, or an array with bad-words?

The array I think is better, so I don't need any external functions etc, but I need to be sure...

What do you think?

Thanks

+1  A: 

An array/list would be quicker overall if you are checking many words. You only have to read the file once and then each check will be against the list.

However, in your application (assuming you want to go ahead despite the pitfalls) it might be better to read the file only when you need to. That way the file could be updated while the application is still running and you wouldn't have to stop and restart the application or call some admin function to reparse the file.

The delay in submission probably won't be noticed by the user anyway. Though using a caching algorithm to see if the file has changed would minimise this.

ChrisF
A: 

Independant of the programming language you are using, I think using in memory arrays for comparison would always be a good and efficient solution considering that its a list of bad words and wouldnt grow really huge.

Gopi
Also php has such powrfull array functions its a crime nto to use them.
Iznogood
A: 

Doing it in an array will definitely be faster as you are not reading from disk. What many user do is store the bad words in the database or a file and read them into a cache (such as memcache or APC) and then look to see if they are in the cache first and if they are not, read from file and then put them into the cache. This is a good approach that is flexible and fast.

Matt Williamson
A: 

Given an infinite amount of RAM and an infinite amount of disk space, obviously RAM will still have an advantage over disk space in terms of performance. But the reality is that not all Web hosts will give you infinite storage, be it RAM or disk space. I guess the choice is obvious.

stillstanding
This is not guaranteed. I could build a machine with slow ram and an SSD, and it would be faster to read from disk. The general trend is that ram is faster than disk, but this is not "obviously", or more importantly, "theoretically" so.
Stefan Kendall
Also, consider a VPS where disk access might be raw and RAM access could be virtual. You really have to consider your environment.
Stefan Kendall