tags:

views:

98

answers:

4

I am trying to search a large string w/o spaces for a smaller string of characters. using regex I can easily find perfect matches but I can't figure out how to find partial matches. by partial matches i mean one or two extra characters in the string or one or two characters that have been changed, or one of each. the first and last characters will always match though. this would be similar to a spell checker but there are no spaces and the strings dont contain actual words, just random hex digits.

i figured a way to find the string if there are no extra characters using indexOf(string.charAt(0)) and indexOf(charAt(string.length()-1) and looping through the characters between the two indexes. but this can be problematic when dealing with randomized characters because of the possibility of finding the first and last characters at the correct spacing but none of the middle characters matching.

i've been scratching my head for hours on this issue. any ideas?

+1  A: 

Here's an article I found that shows how a simple spell checker would work. I know you're not making a spell check but the ideas would be similar.

This reminded me a little of the nearest neighbor algorithm. I've used the nearest neighbor algorithm to do do gesture recognition. But the gestures were really just an array of 2d points, and I would use the nearest-neighbor to determine which gesture seems closest to that gesture, even if the points weren't exactly the same. It seems to me that you may be able to do something along those same lines with what you're trying to do.

Joel
ya know, i searched all over for a sample spell checker program and couldn't find anything pertinent. unfortunately i am not familiar with python, but i know C and Java so i can probably figure it out. thank you. ill post back if it solves my problem.
mike
just found a link at the bottom for the java source but it is a little more advanced than i expected and focuses mainly on replacing the misspelled word. honestly, i cant make heads or tails of it. ill look into the nearest neighbor algorithm.
mike
A: 

Assuming that your search string is, say, 6 characters long, and the first and last characters are "A" and "Z", then

A.{4}Z

would match any substring in the larger string where the first and last characters match in the correct spacing.

Is that what you need?

Tim Pietzcker
A: 

What you are trying to do is a lot like the kind of string-matching that bioinformaticians do matching DNA sequences and the like. This goes under the term sequence alignment.

High Performance Mark
A: 

I didnt make an account last night so i can not comment on your answers, sorry. High Performance Mark, that is exactly what i am trying to do, except not with dna. i will look into sequence alignment.

and Tim Pietzker, ill try and use the method you suggested in parallel with the sequence alignment. thank you for the help.

Mike
use the comments and edit your original question, don't discuss in the answer section it is bad form
fuzzy lollipop