tags:

views:

1401

answers:

10

Is there a standard (preferably Apache Commons or similarly non-viral) library for doing "glob" type matches in Java? When I had to do similar in Perl once, I just changed all the "." to "\.", the "*" to ".*" and the "?" to "." and that sort of thing, but I'm wondering if somebody has done the work for me.

Similar question: http://stackoverflow.com/questions/445910/create-regex-from-glob-expression

+1  A: 

I don't know about a "standard" implementation, but I know of a sourceforge project released under the BSD license that implemented glob matching for files. It's implemented in one file, maybe you can adapt it for your requirements.

Greg Mattes
+2  A: 

GlobCompiler/GlobEngine, from Jakarta ORO, looks promising. It's available under the Apache License.

strout
+4  A: 

There's nothing built-in, but it's pretty simple to convert something glob-like to a regex:

public static String createRegexFromGlob(String glob)
{
    String out = "^";
    for(int i = 0; i < glob.length(); ++i)
    {
        final char c = glob.charAt(i);
        switch(c)
        {
        case '*': out += ".*"; break;
        case '?': out += '.'; break;
        case '.': out += "\\."; break;
        case '\\': out += "\\\\"; break;
        default: out += c;
        }
    }
    out += '$';
    return out;
}

this works for me, but I'm not sure if it covers the glob "standard", if there is one :)

Update by Paul Tomblin: I found a perl program that does glob conversion, and adapting it to Java I end up with:

    private String convertGlobToRegEx(String line)
    {
    LOG.info("got line [" + line + "]");
    line = line.trim();
    int strLen = line.length();
    StringBuilder sb = new StringBuilder(strLen);
    // Remove beginning and ending * globs because they're useless
    if (line.startsWith("*"))
    {
        line = line.substring(1);
        strLen--;
    }
    if (line.endsWith("*"))
    {
        line = line.substring(0, strLen-1);
        strLen--;
    }
    boolean escaping = false;
    int inCurlies = 0;
    for (char currentChar : line.toCharArray())
    {
        switch (currentChar)
        {
        case '*':
            if (escaping)
                sb.append("\\*");
            else
                sb.append(".*");
            escaping = false;
            break;
        case '?':
            if (escaping)
                sb.append("\\?");
            else
                sb.append('.');
            escaping = false;
            break;
        case '.':
        case '(':
        case ')':
        case '+':
        case '|':
        case '^':
        case '$':
        case '@':
        case '%':
            sb.append('\\');
            sb.append(currentChar);
            escaping = false;
            break;
        case '\\':
            if (escaping)
            {
                sb.append("\\\\");
                escaping = false;
            }
            else
                escaping = true;
            break;
        case '{':
            if (escaping)
            {
                sb.append("\\{");
            }
            else
            {
                sb.append('(');
                inCurlies++;
            }
            escaping = false;
            break;
        case '}':
            if (inCurlies > 0 && !escaping)
            {
                sb.append(')');
                inCurlies--;
            }
            else if (escaping)
                sb.append("\\}");
            else
                sb.append("}");
            escaping = false;
            break;
        case ',':
            if (inCurlies > 0 && !escaping)
            {
                sb.append('|');
            }
            else if (escaping)
                sb.append("\\,");
            else
                sb.append(",");
            break;
        default:
            escaping = false;
            sb.append(currentChar);
        }
    }
    return sb.toString();
}

I'm editing into this answer rather than making my own because this answer put me on the right track.

Dave Ray
Yeah, that's pretty much the solution I came up with the last time I had to do this (in Perl) but I was wondering if there was something more elegant. I think I'm going to do it your way.
Paul Tomblin
Actually, I found a better implementation in Perl that I can adapt into Java at http://kobesearch.cpan.org/htdocs/Text-Glob/Text/Glob.pm.html
Paul Tomblin
Couldn't you use a regex replace to turn a glob into a regex?
Tim Sylvester
The lines at the top that strip out the leading and trailing '*' need to be removed for java since String.matches against the whole string only
KitsuneYMG
FYI: The standard for 'globbing' is the POSIX Shell language - http://www.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_13_01
Stephen C
Stephen C: Thanks for the tip.
Dave Ray
I think the first snippet of code has a problem if it is passed a glob with mismatched parentheses, e.g. `(*`. I believe `(` is non-special in a glob, and it will get converted to `(.*`, which is not a valid regex.
Simon Nickerson
+1  A: 

Globbing is also planned for Java 7.

See http://java.sun.com/docs/books/tutorial/essential/io/find.html

finnw
A: 

By the way, it seems as if you did it the hard way in Perl

This does the trick in Perl:

my @files = glob("*.html")
# Or, if you prefer:
my @files = <*.html>
That only works if the glob is for matching files. In the perl case, the globs actually came from a list of ip addresses that was written using globs for reasons I won't go into, and in my current case the globs were to match urls.
Paul Tomblin
+2  A: 

This is a simple Glob implementation which handles * and ? in the pattern

public class GlobMatch {
    private String text;
    private String pattern;

    public boolean match(String text, String pattern) {
     this.text = text;
     this.pattern = pattern;

     return matchCharacter(0, 0);
    }

    private boolean matchCharacter(int patternIndex, int textIndex) {
     if (patternIndex >= pattern.length()) {
      return false;
     }

     switch(pattern.charAt(patternIndex)) {
      case '?':
       // Match any character
       if (textIndex >= text.length()) {
        return false;
       }
       break;

      case '*':
       // * at the end of the pattern will match anything
       if (patternIndex + 1 >= pattern.length() || textIndex >= text.length()) {
        return true;
       }

       // Probe forward to see if we can get a match
       while (textIndex < text.length()) {
        if (matchCharacter(patternIndex + 1, textIndex)) {
         return true;
        }
        textIndex++;
       }

       return false;

      default:
       if (textIndex >= text.length()) {
        return false;
       }

       String textChar = text.substring(textIndex, textIndex + 1);
       String patternChar = pattern.substring(patternIndex, patternIndex + 1);

       // Note the match is case insensitive
       if (textChar.compareToIgnoreCase(patternChar) != 0) {
        return false;
       }
     }

     // End of pattern and text?
     if (patternIndex + 1 >= pattern.length() && textIndex + 1 >= text.length()) {
      return true;
     }

     // Go on to match the next character in the pattern
     return matchCharacter(patternIndex + 1, textIndex + 1);
    }
}
Tony Edgecombe
A: 

Long ago I was doing a massive glob-driven text filtering so I've written a small piece of code (15 lines of code, no dependencies beyond JDK). It handles only '*' (was sufficient for me), but can be easily extended for '?'. It is several times faster than pre-compiled regexp, does not require any pre-compilation (essentially it is a string-vs-string comparison every time the pattern is matched).

Copy/paste from here

bobah
A: 

I recently had to do it and used \Q and \E to escape the glob pattern:

private static Pattern getPatternFromGlob(String glob)
{
    return Pattern.compile(
        "^\\Q" 
        + glob.replace("*", "\\E.*\\Q")
              .replace("?", "\\E.\\Q") 
        + "\\E$");
}
Vincent Robert
A: 

Similar to Tony Edgecombe's answer, here is a short and simple globber that supports * and ? without using regex, if anybody needs one.

public static boolean matches(String text, String glob) {
    String rest = null;
    int pos = glob.indexOf('*');
    if (pos != -1) {
        rest = glob.substring(pos + 1);
        glob = glob.substring(0, pos);
    }

    if (glob.length() > text.length())
        return false;

    // handle the part up to the first *
    for (int i = 0; i < glob.length(); i++)
        if (glob.charAt(i) != '?' 
                && !glob.substring(i, i + 1).equalsIgnoreCase(text.substring(i, i + 1)))
            return false;

    // recurse for the part after the first *, if any
    if (rest == null) {
        return glob.length() == text.length();
    } else {
        for (int i = glob.length(); i <= text.length(); i++) {
            if (matches(text.substring(i), rest))
                return true;
        }
        return false;
    }
}
mihi
A: 

There are couple of libraries that do Glob-like pattern matching that are more modern than the ones listed:

Theres Ants Directory Scanner And Springs AntPathMatcher

I recommend both over the other solutions since Ant Style Globbing has pretty much become the standard glob syntax in the Java world (Hudson, Spring, Ant and I think Maven).

Adam Gent