tags:

views:

408

answers:

5

I am working on a new captcha script and it is almost completed except I would like to have a list of words for example lets say I have a list of 300 5 letter words that I would like to use for the captcha image text.

What would be the best way for performance on a high traffic site to deal with this list for it?

Read the words from a text file on every load
Store in an array
other?

+2  A: 

If you just want 300 hundred words to choose from, I'd just put them all in an array in straight php code and pull one out randomly. That would be the best performance.

jayrdub
or serialized object could help as well.
dusoft
A: 

Instead of 300 words, you could simply generate a random number and display that. No need for a list, or loading a list, or managing the list, ....

Ira Baxter
already done, I am extending the features of this, I am looking for the best way to get a list of any length and select 1 random word from it
jasondavis
I don't think you understood my answer. Instead of a list of words from which you chose randomly, you simply choose a random number and display the number itself. No words needed. No list needed. No list management needed. Its pretty brainless.
Ira Baxter
that would be the weakest captcha I have seen, there is only 10 numbers you know right?
jasondavis
You've *got* to be kidding. Random number generators in almost every language I know can produce 56 bit of random value using a 64 bit floating point "random" function. EVen brain-dead PHP *has* a random number generator (which true to PHP form only gaurantees 15 bits) but has a range far greater than "10 numbers". http://www.php.net/manual/en/function.rand.php
Ira Baxter
I'll admit I don't know a lot about that but you are saying number 1-10 or 9+0 is more secure then using the whole alfabet? Like i said I don't understand all the fancy math it just seems like 27 characters in combination which is somewhere around 456,976 possible captcha
jasondavis
An n-bit number has 2 raised to the power of n possible values; PHP's "rand" function produces at least 15 bits or 2^15 combinations ==> 32768 different possible numbers. If it is reasonably implemented, it will produce 56 bits or 2^56 combinations ~~ 10^15 == 1 *trillion* possible random numbers. Try running it; see the link above. You're suggesting some 450,000 possible combinations. You need to *talk* to another programmer for 30 minutes if you don't understand this.
Ira Baxter
PS: you want to be serious computer programmer: work *hard* on understanding the fancy math.
Ira Baxter
A: 

Just how many logons per second do you need to handle? This doesn't seem like the right place to spend time in optimization. Just about any way you find the random word should be fine, especially if your word list is only 300 words.

I'd start with a simple text file, one word per line, and just do something simple like

$words = file("wordlist.txt");
return ($words[rand(0, count($word)-1);

and only if it really proved to be a bottleneck would I change it to do a random fseek() or some other "high performance" trick.

Lucky
+2  A: 

Using a fixed list of words could make your Captcha weak since it restricts the number of variations to just n! / (n - k)! options. With n = 300 words and k=2 different words per captcha it would be just 89700 options no matter how long the words are.

If you would use a sequence of four random letters (a-z) you would get more options (exactly n^k = 26^4 = 456976).

Gumbo
wow thats some crazy math I didnt even know you could find that out, anyways I did end up fdropping the word idea and use 4 characters, well I made my function where I can change it easily to have more, I wonder how many 5 or 6 characters would have lol
jasondavis
You can calculate that with the formular *n*^*k* where *n* is the number of characters in your character set and *k* the number of characters you want to use.
Gumbo
It is a science called discrete mathematics.
Alfred
+1  A: 

Best option for performance

  1. It would be best, to put list of random numbers in memory (APC or Memcache => google/stackoverflow search for APC or Memcache) to get the best performance, because disc IO is what will make your site slow most of the time. For this you should have a box with enough memory(>= 128MB) and you can install software (APC/Memcache). If you want good performance on a high traffic site, you should be willing to pay for !!!

  2. If you are on a shared hosting provider (but then you won't get best performance), then it would be best to put the words in an array in the same file, because every require statement will fetch the file from disc.

return random word

Like lucky said you can fetch a random number, by a simple rand function call

return ($words[rand(0, count($words)-1);

Where $words is the array with all the words.

VPS hosting

These are some cheap VPS hosting I found using google, but I think you should do some more research finding the best VPS hosting for your high performance site.

Alfred
Thanks for the info, im running on 2 servers now, just always looking for best performance everywhere and I just started using memcache yesterday, I would like to try out APC, I thought APC was disk based though? Anyways I ended up not doing the random word list but I did a random font list to build my characters with an array of fonts and instead of using the array rand function I thought maybe the regular rand function might be faster? SO I used that to pick a number corresponding to my array number
jasondavis
NOO APC is memory based and also (primarly) is an opcode cache which will put bytecode in memory which will speed up your php a lot. If possible you should really install it because it will speed up your site a lot. The other thing which APC can do is put, variables in memory (Memory way faster then Disc IO)http://en.wikipedia.org/wiki/Alternative_PHP_Cache#Alternative_PHP_Cache
Alfred