views:

57

answers:

1

Hi!

I'm trying to figure out if there is an easy way to count the number of words that appear in small paragraph (#1) and small paragraph (#2).

Generally, Im determining how much overlap there is in these paragraphs on a word by word basis. So if (#1) contains the word "happy" and (#2) contains the word "happy" that would be like a +1 value.

I know that I could use a String.contains() for each word in (#1) applied to (#2). But I was wondering if there is something more efficient that I could use

+7  A: 

You can create two sets s1 and s2, containing all words from first and second paragraph respectively, and intersect them: s1.retainAll(s2). Sounds easy enough.

update
Works for me

    Set<String> s1 = new HashSet<String>(Arrays.asList("abc xyz 123".split("\\s")));
    Set<String> s2 = new HashSet<String>(Arrays.asList("xyz 000 111".split("\\s")));
    s1.retainAll(s2);
    System.out.println(s1.size());

Don't forget to remove empty word from both sets.

Nikita Rybak
I was just typing out a complicated algorithm but this is much cleaner lol. +1 for knowing the Java API better than me.
Mike
Sounds good, will try and let you know if it worked!
rockit
I keep getting 0 with this method. Im testing it with a hashset of 3 words vs a hashset of 3 words... everytime the result is zero..., sets are un-ordered, one word is common between the two
rockit
Got it! I was re-using a hashmap. Thus whenever I re-declared the variable, I had to create a new hashmap instead of just setting it. THanks for the answer!
rockit