views:

60

answers:

2

If so, please explain how.

Re: what is distance -- "The distance between two strings is defined as the minimal number of edits required to convert one into the other."

For example, xyz to XYZ would take 3 edits, so the string xYZ is closer to XYZ and xyz.

If the pattern is [0-9]{3} or for instance 123, then a23 would be closer to the pattern than ab3.

How can you find the shortest distance between a regexp and a non-matching string?

The above is the Damerau–Levenshtein distance algorithm.

+2  A: 

If you mean the string with the smallest levenshtein distance between the closest matched string and a sample, then I'm pretty sure it can be done, but you'd have to convert the Regex to a DFA yourself, then try to match and whenever something fails, non-deterministically continue as if it had passed and keep track of the number differences. you could use A* search or something similar for this, it would be quite inefficient though (O(2^n) worst case)

tobyodavies
**@tobyodavies:** Thanks!
blunders
+2  A: 

You can use Finite State Machines to do this efficiently (that is, linear in time). If you use a transducer, you can even write the specification of the transformation fairly compactly and do far more nuanced transformations than simply inserts or deletes - see wikipedia for Finite State Transducer as a starting point, and software such as the FSA toolkit or FSA6 (which has a not entirely stable web-demo) too. There are lots of libraries for FSA manipulation; I don't want to suggest the previous two are your only or best options, just two I've heard of.

If, however, you merely want the efficient, approximate searching, a less flexibly but already-implemented-for-you option exists: TRE, which has an approximate matching function that returns the cost of the match - i.e., the distance to the match, from your perspective.

Eamon Nerbonne
**@Eamon Nerbonne:** Thanks Eamon, I was going to ask you on my other questions, but figured I'd just work my way to answer... this was a huge help, and TRE looks great! Cheers! (You rock!)
blunders
**@Eamon Nerbonne:** +1 For being a regex master, having a great answer, and editing my question... :-)
blunders
Wow, you learn something new every day +1
tobyodavies