ansaurus

Question

C# compare two strings for matching words.

Answer 1

+11 A:

A LINQ solution

"elza7ma wa2fa fel matab".Split()
                         .Intersect("2ana ba7eb el za7ma 2awy 2awy".Split())
                         .Any();

// as a string extension method
public static class StringExtensions
{
    public static bool OneWordMatches(this string theString, string otherString)
    {
        return theString.Split().Intersect(otherString.Split()).Any();
    }
}

// returns true
"elza7ma wa2fa fel matab 2ana".OneWordMatches("2ana ba7eb el za7ma 2awy 2awy");

Russ Cam 2010-09-24 07:43:21

No overload of `Split` takes a single `char`. Probably best to also specify `RemoveEmptyEntries`

JaredPar 2010-09-24 07:45:56

You can use the `Split()` without any parameters. In this case it will use spaces, tabs and new lines as separator.

Oliver 2010-09-24 07:50:51

Is this really faster, or does Intersect() also loops through both arrays?

Sjoerd 2010-09-24 07:52:01

Not sure about performance, will need to benchmark them. I suspect a hashset implementation like JaredPar's would be faster

Russ Cam 2010-09-24 07:58:47

This one is implementation is excellent. I will be comparing hundreds of strings so performance is a must. Do you know whether this is faster or the hashset? how can I know?

Marwan 2010-09-24 08:47:03

@Marwan: You can know by *trying it both ways and measuring the results*. How else could you possibly know?

Eric Lippert 2010-09-24 16:01:24

@Russ: FYI Intersect already uses a Set<T> class internally to implement intersection.

Eric Lippert 2010-09-24 16:03:49

@Eric - Thanks - I couldn't quite see what was going on in Reflector :)

Russ Cam 2010-09-24 18:42:35

Answer 2

+4 A:

I think the easiest way is to break up the strings into words and use a set structure like HashSet<string> to check for duplicates. For example

public bool HasMatchingWord(string left, string right) { 
  var hashSet = new HashSet<string>(
    left.Split(" ", StringSplitOptions.RemoveEmptyEntries)); 
  return right
    .Split(" ", StringSplitOptions.RemoveEmptyEntries)
    .Any(x => hashSet.Contains(x));
}

JaredPar 2010-09-24 07:44:49

Might want to add an equality check as well to handle collisions (if there are any).

Jeff M 2010-09-24 08:18:19

Is anyone sure if this is the best method performance wise?

Marwan 2010-09-24 08:49:40

Answer 3

+1 A:

You could split the two strings by word and build two hashtables/dictionaries. Then go through both and add keys incrementing an int in a third dictionary (Dictionary<string, int>). If any key in the third dictionary have a count of more than one, that word is in both original strings.

I would think that any algorithm to solve this problem would be 'slow' - especially for large input strings/many words.

mbanzon 2010-09-24 07:45:07

Adding all words to the same HashSet and checking the return value of Add() is simpler.

Sjoerd 2010-09-24 07:53:40

I re-read the original question - yes it would be much simpler. He is just asking if there are any words that are represented in both strings - not which and not how many occurences.

mbanzon 2010-09-24 08:08:40

Answer 4

A:

I'd probably take the initial performance hit and split the string and then order alphabetically and by word length. If you just have to find out if one word matches, break as soon as you find one. Once you have the split string arrays ordered alphabetically and by length, that limits the numbers of compares you would have to do.

Matt 2010-09-24 07:45:37

Answer 5

A:

The easiest way would be to compare all words to any other word. This is an easy solution, but slow.
Another way is to sort both lists, and then compare the top two entries. Like mergesort, but with the goal of finding equal words.
Another way is to compile the list of words into a tree, and match the words against that tree. A regex can do this, or you can do this yourself. In your example, the first letter should be 2, b, e or z. This way, each word is only inspected once and the least number of characters are inspected.

Sjoerd 2010-09-24 07:46:41

ansaurus

tags:

views:

answers:

C# compare two strings for matching words.

related questions