tags:

views:

67

answers:

2

Let's say I have a book title and I search for it in a database. The database produces matches, some of which are full matches and some of which are partial matches.

A full match is when every word in the search result is represented by a word in the search terms. (i.e. there does not have to be a complete overlap on both sides)

I am only concerned with finding the full matches.

So if I type a search for "Ernest Hemingway - The Old Man and the Sea" and the results return the following:

Charles Nordhoff - Men Against The Sea
Rodman Philbrick - The Young Man and the Sea
Ernest Hemingway - The Old Man and the Sea
Ernest Hemingway - The Sun Also Rises
Ernest Hemingway - A Farewell to Arms
Ernest Hemingway - For Whom the Bell Tolls
Ernest Hemingway - A Moveable Feast
Ernest Hemingway - True at First Light
Men Against The Sea
The Old Man and the Sea
The Old Man and the Sea Dog

There are TWO full matches in this list: (according to the definition above)

Ernest Hemingway - The Old Man and the Sea 
The Old Man and the Sea 

To do this in Java, assume I have two variables:

String searchTerms;
List<String> searchResults;

searchTerms in the example above represents what I typed in: Ernest Hemingway - The Old Man and the Sea

searchResults represents the list of strings I got back from the database above.

for (String result : searchResults) {
  // How to check for a full match? 
  // (each word in `result` is found in `searchTerms` 
}

My question is: in this for-loop, how do I check whether every word in the result String has a corresponding word in the searchTerms String?

+1  A: 

Assuming your database result is accurate,

Split up result into tokens (words) using String.split(String delimiter) and see whether each token is found in searchTerms (using searchTerms.indexOf(String word) == -1).

for (String result : searchResults) {
    for(String word : result) {
        if(searchTerms.indexOf(word) == -1) {
            // result is not a full match
        }
    }

    //If none of the if statements executed, statement is a full match.
}
Christian Mann
+3  A: 

To find the full match as you have defined it, you want to test that a set of tokens contains a particular subset. You can do this easily using a Set which you get for free in the collections libraries. One way to do this would be (the expense of regexes aside):

   Set<String> searchTerms = new HashSet<String>();
   Set<String> resultTokens = new HashSet<String>();

   searchTerms.addAll( Arrays.asList( searchString.split( "\\s+" ) );

   for ( String result : searchResults )
   {
      resultTokens.clear();
      resultTokens.addAll( Arrays.asList( result.split( "\\s+" ) ) );
      if ( resultTokens.containsAll( searchTerms ) )
      {
         // Perform match code
      }
   }

Alternatively, if you wanted to be stricter about it, you could test for set equality using resultTokens.equals( searchTerms ). In your example, this would narrow the result set to "Ernest Hemingway - The Old Man and the Sea"

charstar