ansaurus

Question

Answer 1

+1 A:

Instead of manually searching for common words, why not put each sentence's words into a Set and then compute the intersection of both sets using retainAll()?

This tutorial on the Set Interface may help.

I assume this is homework... have you learned about algorithmic complexity, aka Big-O notation? If so, consider the complexity of your posted code vs. using a TreeSet vs. using a HashSet.

Dan 2010-05-03 11:10:12

Algorithmically, DP LCS is the way to go. http://en.wikipedia.org/wiki/Longest_common_substring

polygenelubricants 2010-05-03 11:26:27

Maybe I misunderstood the OP, because to me it doesn't sound like LCS at all. I thought the goal was to find the word common to both sentences which contained the most characters. Wouldn't LCS return "ndeba da gadzlierdeba aucileblad", which is a partial-word and three full words?

Dan 2010-05-03 11:45:20

it is not homework thank u polygenelubricants thanks very much everybody

2010-05-03 13:23:31

Answer 2

+2 A:

The following snippet should be instructive:

    import java.util.*;
    //...

    String text1 = "saqartvelo gabrwyindeba da gadzlierdeba aucileblad";
    String text2 = "saqartvelo gamtliandeba da gadzlierdeba aucileblad";

    List<String> common = new ArrayList<String>();
    for (String s1 : text1.split(" ")) {
        for (String s2 : text2.split(" ")) {
            if (s1.equals(s2)) {
                common.add(s1);
            }
        }
    }

    Collections.sort(common, new Comparator<String>() {
        @Override public int compare(String s1, String s2) {
            return s2.length() - s1.length();
        }       
    });

    System.out.println(common);
    // prints "[gadzlierdeba, saqartvelo, aucileblad, da]"

Key ideas:

Prefer List over arrays
- Especially handy if you don't know how many elements there will be in advance
Prefer foreach
StringTokenizer is a legacy-class; prefer String.split
Use a custom Comparator and Collections.sort for sorting a List

An alternative solution

Note that the above solution is O(N^2), since it check each pair of words to see if they're equal. This means that it doesn't scale well when the two texts have many words. Using a Set such as a HashSet, you can do this in expected O(N) time, using Set.retainAll to compute the intersection of two sets.

static Set<String> wordSet(String text) {
    return new HashSet<String>(Arrays.asList(text.split(" ")));
}
//...

String text1 = ...;
String text2 = ...;

Set<String> commonSet = wordSet(text1);
commonSet.retainAll(wordSet(text2));

List<String> common = new ArrayList<String>(commonSet);
System.out.println(common);
// prints "[da, aucileblad, saqartvelo, gadzlierdeba]"
// in no particular order

// sort by string length using Comparator as above

polygenelubricants 2010-05-03 11:11:46

ansaurus

tags:

views:

answers:

longest string in texts

Related questions

An alternative solution

related questions