views:

187

answers:

6

There have been numerous posts on string algorithms:

However, no general literature was mentioned.

Could anyone recommend a book(s) that would thoroughly explore various string algorithms? The topic which is of special interest is approximate string matching [things like google-offered corrected search string variants :) ].

Thanks a lot for advice.

A: 

CLR has some string processing algorithms, but it's not specific to them.

Including:

Stephen
A: 

Donald E. Knuth is famous for his Art of Computer Programming series which covers plenty of algorithms. But I doubt that googles latest stuff is included.

stacker
+1  A: 

TRE is an open-source library that implements approximate matching. The About page has some interesting hints about how it works, although I'm not sure it provides the sort of in-depth analysis you're looking for. The source code is probably more enlightening from that perspective.

Hank Gay
thanks, i'll take a look at it.
Max
+2  A: 

This is not a book recommendation, but this library and site is a library that offers plenty of efficient string matching algorithm implementations:

http://www.dcs.shef.ac.uk/~sam/simmetrics.html

It also provides links to further learning for each and where each is best applicable.

Mikos
Thanks a lot for this one. It seems to contain whole bunch of terms and consice introductory to the topic.
Max
+4  A: 

I'm surprised no-one has mentioned Dan Gusfield's excellent book Algorithms on Strings, Trees and Sequences which covers string algorithms in more detail than anyone would probably need. It served me very well for a project on protein sequencing that I was working on a few years ago. After reading this book you will learn:

  • Naive String Matching
  • Preprocessor Based algorithms (Boyer Moore, Knuth-Morris-Pratt)
  • Regex matching algorithms
  • Karp-Rabin and similar methods
  • Suffix tree methods (Ukkonen's method, etc)
  • Sequence alignment (Levenshtein distance and string similarity, and multiple sequence alignment)
  • Applications to DNA sequencing, gene prediction and other areas.
Il-Bhima
+1 It's called "Algorithms on Strings Trees and Sequences" and is an awesome reference
Hightechrider
And it's "relatively" new (compared to other text mentioned in other answers), because it incorporated many recent academic works.
monn
+1  A: 

Jewels of Stringology

Christopher Barber