views:

55

answers:

5

I know I can get whether 2 strings are equal in content, but I need to be able to get the number of characters that differ in the result of comparing 2 string values.

For instance:

"aaaBaaaCaaaDaaaEaaa"
"aaaXaaaYaaaZaaaEaaa"

so the asnwer is 3 for this case.

Is there an easy way to do this, using regex, linq or any other way?

EDIT: Also the strings are VERY long. Say 10k+ characters.

A: 

I would simply loop over the character arrays, adding up a counter for each difference.

This will not account for strings with different lengths, however.

Oded
Thanks but the strings are very long, I guess 10k characters or more.
Joan Venge
It is theoretically impossible to be any faster than that. Looping through 10K characters and comparing them shouldn't take long.
SLaks
Speed is not an issue in this case :O
Joan Venge
A: 

If both strings have the same length and do not have complicated Unicode characters like surrogates, you can loop through each character and increment a counter if the characters at that index in each string are different.

It is theoretically impossible to do it any faster. (You need to check every single character)

SLaks
+1  A: 

You can use LINQ:

string a = "aaaBaaaCaaaDaaaEaaa";
string b = "aaaXaaaYaaaZaaaEaaa";

int result = a.Zip(b, (x, y) => x == y).Count(z => !z)
           + Math.Abs(a.Length - b.Length);

A solution with a loop is probably more efficient though.

dtb
Thanks, would this case if the strings have different number of chars?
Joan Venge
@Joan Verge: Then you need to add the length difference to the result.
dtb
THanks, I see what you mean. But in that case, this doesn't handle insertions, etc to be counted accordingly, right?
Joan Venge
@Joan Venge: Right, if you need the Levenstein distance and not just the number of places where the strings differ (which is what you originally asked for), then my answer doesn't help.
dtb
+4  A: 

In case there are inserts and deletes: Levenstein distance

and here's the C# implementation

Max
+1  A: 

Hey, look at this: http://en.wikipedia.org/wiki/Hamming_distance

It will help you if you want to count deletions and insertions, not only replacements.

Daniel Mošmondor