views:

1106

answers:

10

I need a comparator in java which has the same semantics as the sql 'like' operator. For example:

myComparator.like("digital","%ital%");
myComparator.like("digital","%gi?a%");
myComparator.like("digital","digi%");

should evaluate to true, and

myComparator.like("digital","%cam%");
myComparator.like("digital","tal%");

should evaluate to false. Any ideas how to implement such a comparator or does anyone know an implementation with the same semantics? Can this be done using a regular expression?

+8  A: 

.* will match any characters in regular expressions

I think the java syntax would be

"digital".matches(".*ital.*");

And for the single character match just use a single dot.

"digital".matches(".*gi.a.*");

And to match an actual dot, escape it as slash dot

\.
Bob
yeah, thanks! But in case the word ins't so simple like "%dig%" and the string needs some escping? Is there anything already exsiting? What about the '?' ?
Chris
I edited my answer for the question mark operator. I am a little confused by the rest of your comment though. Are you saying the string is coming to you in sql syntax and you want to evaluate it as is? If that is the case I think you will need to replace to sql syntax manually.
Bob
what if the string which is used as a search pattern contains grouping characters like '(' or ')' escape them too? how mayn other characters needs escaping?
Chris
I think that will depend on how many options you are allowing.
Bob
Just beware that .* is greedy(.*? might be more approriate). I don't think .* in regex is exactly the same semantics as % in SQL.
GreenieMeanie
That is a good point, see this question for an explination http://stackoverflow.com/questions/255815/how-can-i-fix-my-regex-to-not-match-too-much-with-a-greedy-quantifier
Bob
+1  A: 

Java strings have .startsWith() and .contains() methods which will get you most of the way. For anything more complicated you'd have to use regex or write your own method.

job
+1  A: 

You could turn '%string%' to contains(), 'string%' to startsWith() and '%string"' to endsWith().

You should also run toLowerCase() on both the string and pattern as LIKE is case-insenstive.

Not sure how you'd handle '%string%other%' except with a Regular Expression though.

If you're using Regular Expressions:

Dave Webb
what abot "%this%string%"? split on the '%' sign, iterate over the array and than check for every entry? i think this could be done better ...
Chris
+3  A: 

Yes, this could be done with a regular expression. Keep in mind that Java's regular expressions have different syntax from SQL's "like". Instead of "%", you would have ".*", and instead of "?", you would have ".".

What makes it somewhat tricky is that you would also have to escape any characters that Java treats as special. Since you're trying to make this analogous to SQL, I'm guessing that ^$[]{}\ shouldn't appear in the regex string. But you will have to replace "." with "\\." before doing any other replacements. (Edit: Pattern.quote(String) escapes everything by surrounding the string with "\Q" and "\E", which will cause everything in the expression to be treated as a literal (no wildcards at all). So you definitely don't want to use it.)

Furthermore, as Dave Webb says, you also need to ignore case.

With that in mind, here's a sample of what it might look like:

public static boolean like(String str, String expr) {
    expr = expr.toLowerCase(); // ignoring locale for now
    expr = expr.replace(".", "\\."); // "\\" is escaped to "\" (thanks, Alan M)
    // ... escape any other potentially problematic characters here
    expr = expr.replace("?", ".");
    expr = expr.replace("%", ".*");
    str = str.toLowerCase();
    return str.matches(expr);
}
Michael Myers
exists there a method, which escapes every charachter with special meaning in java regex?
Chris
Yes, Pattern.quote (http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html#quote%28java.lang.String%29 ) will do it. For some reason, I thought that might cause a problem, but now I don't know why I didn't include it in the answer.
Michael Myers
Oh yes, now I remember. It's because ? is a special regex character, so it would be escaped before we could replace it. I suppose we could instead use Pattern.quote and then expr = expr.replace("\\?", ".");
Michael Myers
That third line should read `replace(".", "\\.");`
Alan Moore
You are right. I should have tested it on dots before posting it.
Michael Myers
+1  A: 

i dont know exactly about the greedy issue, but try this if it works for you:

public boolean like(final String str, String expr)
  {
    final String[] parts = expr.split("%");
    final boolean traillingOp = expr.endsWith("%");
    expr = "";
    for (int i = 0, l = parts.length; i < l; ++i)
    {
      final String[] p = parts[i].split("\\\\\\?");
      if (p.length > 1)
      {
        for (int y = 0, l2 = p.length; y < l2; ++y)
        {
          expr += p[y];
          if (i + 1 < l2) expr += ".";
        }
      }
      else
      {
        expr += parts[i];
      }
      if (i + 1 < l) expr += "%";
    }
    if (traillingOp) expr += "%";
    expr = expr.replace("?", ".");
    expr = expr.replace("%", ".*");
    return str.matches(expr);
}
tommyL
Your inner split() and loop replaces any \? sequence with a dot--I don't get that. Why single out that sequence, only to replace it with a dot just like a lone question mark?
Alan Moore
it replaces the '?' with a '.' because '?' is a place holder for a single arbitrary character. i know '\\\\\\?' looks strange but i testedt it and for my tests it seems to work.
tommyL
+1  A: 

Regular expressions are the most versatile. However, some LIKE functions can be formed without regular expressions. e.g.

String text = "digital";
text.startsWith("dig"); // like "dig%"
text.endsWith("tal"); // like "%tal"
text.contains("gita"); // like "%gita%"
Peter Lawrey
+2  A: 

Every SQL reference I can find says the "any single character" wildcard is the underscore (_), not the question mark (?). That simplifies things a bit, since the underscore is not a regex metacharacter. However, you still can't use Pattern.quote() for the reason given by mmyers. I've got another method here for escaping regexes when I might want to edit them afterward. With that out of the way, the like() method becomes pretty simple:

public static boolean like(final String str, final String expr)
{
  String regex = quotemeta(expr);
  regex = regex.replace("_", ".").replace("%", ".*?");
  Pattern p = Pattern.compile(regex,
      Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
  return p.matcher(str).matches();
}

public static String quotemeta(String s)
{
  if (s == null)
  {
    throw new IllegalArgumentException("String cannot be null");
  }

  int len = s.length();
  if (len == 0)
  {
    return "";
  }

  StringBuilder sb = new StringBuilder(len * 2);
  for (int i = 0; i < len; i++)
  {
    char c = s.charAt(i);
    if ("[](){}.*+?$^|#\\".indexOf(c) != -1)
    {
      sb.append("\\");
    }
    sb.append(c);
  }
  return sb.toString();
}

If you really want to use ? for the wildcard, your best bet would be to remove it from the list of metacharacters in the quotemeta() method. Replacing its escaped form -- replace("\\?", ".") -- wouldn't be safe because there might be backslashes in the original expression.

And that brings us to the real problems: most SQL flavors seem to support character classes in the forms [a-z] and [^j-m] or [!j-m], and they all provide a way to escape wildcard characters. The latter is usually done by means of an ESCAPE keyword, which lets you define a different escape character every time. As you can imagine, this complicates things quite a bit. Converting to a regex is probably still the best option, but parsing the original expression will be much harder--in fact, the first thing you would have to do is formalize the syntax of the LIKE-like expressions themselves.

Alan Moore
yes, you are right. i like your solution better than mine.
tommyL
A: 

Apache Cayanne ORM has an "In memory evaluation"

It may not work for unmapped object, but looks promising:

Expression exp = ExpressionFactory.likeExp("artistName", "A%");   
List startWithA = exp.filterObjects(artists);
OscarRyz
do you know if hibernate does support this feature? i mean, to filter objects currently in memory using such an expression?
tommyL
A: 

The Comparator and Comparable interfaces are likely inapplicable here. They deal with sorting, and return integers of either sign, or 0. Your operation is about finding matches, and returning true/false. That's different.

John O
you are welcome to suggest a better name for the operator. i dont like critics without suggestions for improvements, btw.
Chris