I have a project on benchmarking String Matching Algorithms and I would like to know if there is a standard for every algorithm so that I would be able to get fair results with my experimentation. I am planning to use java's system.nanotime in getting the running time of every algorithm. Any comment or reactions regarding my problem is very much appreciated. Thanks!
I am not entirely sure what you're asking. However, I am guessing you are asking how to get the most realistic results. You need to run your algorithm hundreds, or even thousands of iterations to get an average. It is also very important to turn off any caching that your language may do, and don't reuse objects, unless it is part of your algorithm.
I am not entirely sure what you're asking. However, another interpretation of what you are asking can be answered by trying to work out how a given algorithm performs as you increase the size of the problem. Using raw time to compare algorithms at a given string size does not necessarily allow for accurate comparison. Instead, you could try each algorithm with different string sizes and see how the algorithm behaves as string size varies.
And Mark's advice is good too. So you are running repeated trials for many different string lengths to get a picture of how one algorithm works, then repeating that for the next algorithm.
Again, it's not clear what you're asking, but here's another thought in addition to what Tony and Mark said:
Be very careful about testing only "real" input or only "random" input. Some algorithms are tuned to do well on typical input (searching for a word in English text), while others are tuned for working well on pathologically hard cases. You'll need a huge mix of possible inputs of all different types and sizes to do a truly good benchmark.
If you need to implement some algorithms, or to get an idea about analyzing some approaches, you may want to look at these lecture slides:
http://www.macs.hw.ac.uk/~alison/alg/lectures/l3.pdf
Once you have an idea as to what the algorithms you are testing are best for, then, as mentioned, you will want to have a large test suite of strings that will test the worst-case of each algorithm and the best case, but then you will need to determine how to analyze the results, as it won't just be a pass-fail.
How often will the best and worst case show up? For example, if the worst case is random characters, how likely is someone to be searching on that?
You may want to have a web page, perhaps of an article, and then use that as a test and have the sentences or paragraphs be a real-world test, and see how the algorithms do on that.
Ultimately, until you have analyzed the algorithms you will be testing it will be hard to come up with ideas on how to benchmark, in terms of what types of words or strings should be used.