Firstly, I am working on a windows xp 64 machine with 4gb ram and 2.29 ghz x4
I am indexing 220,000 lines of text that are more or less the same length. These are divided into 15 equally sized files. File 1/15 takes 1 minute to index. As the script indexes more files, it seems to take much longer with file 15/15 taking 40 minutes.
My understanding is that the more I put in memory, the faster the script is. The dictionary is indexed in a hash, so fetch operations should be O(1). I am not sure where the script would be hanging the CPU.
I have the script here.