tags:

views:

799

answers:

5

I'm in the process of moving an application from PHP to Java and there is heavy use of regular expressions in the code. I've run across something in PHP that doesn't seem to have a java equivalent:

preg_replace_callback()

For every match in the regex, it calls a function that is passed the match text as a parameter. As an example usage:

$articleText = preg_replace_callback("/\[thumb(\d+)\]/",'thumbReplace', $articleText);
# ...
function thumbReplace($matches) {
   global $photos;
   return "<img src=\"thumbs/" . $photos[$matches[1]] . "\">";
}

What would be the ideal way to do this in Java?

+10  A: 

I don't know of anything similar that's built into Java. You could roll your own without too much difficulty, using the Matcher class:

import java.util.regex.*;

public class CallbackMatcher
{
    public static interface Callback
    {
        public String foundMatch(MatchResult matchResult);
    }

    private final Pattern pattern;

    public CallbackMatcher(String regex)
    {
        this.pattern = Pattern.compile(regex);
    }

    public String replaceMatches(String string, Callback callback)
    {
        final Matcher matcher = this.pattern.matcher(string);
        while(matcher.find())
        {
            final MatchResult matchResult = matcher.toMatchResult();
            final String replacement = callback.foundMatch(matchResult);
            string = string.substring(0, matchResult.start()) +
                     replacement + string.substring(matchResult.end());
            matcher.reset(string);
        }
    }
}

Then call:

final CallbackMatcher.Callback callback = new CallbackMatcher.Callback() {
    public String foundMatch(MatchResult matchResult)
    {
        return "<img src=\"thumbs/" + matchResults.group(1) + "\"/>";
    }
};

final CallbackMatcher callbackMatcher = new CallbackMatcher("/\[thumb(\d+)\]/");
callbackMatcher.replaceMatches(articleText, callback);

Note that you can get the entire matched string by calling matchResults.group() or matchResults.group(0), so it's not necessary to pass the callback the current string state.

EDIT: Made it look more like the exact functionality of the PHP function.

Here's the original, since the asker liked it:

public class CallbackMatcher
{
    public static interface Callback
    {
        public void foundMatch(MatchResult matchResult);
    }

    private final Pattern pattern;

    public CallbackMatcher(String regex)
    {
        this.pattern = Pattern.compile(regex);
    }

    public String findMatches(String string, Callback callback)
    {
        final Matcher matcher = this.pattern.matcher(string);
        while(matcher.find())
        {
            callback.foundMatch(matcher.toMatchResult());
        }
    }
}

For this particular use case, it might be best to simply queue each match in the callback, then afterwards run through them backwards. This will prevent having to remap indexes as the string is modified.

jdmichal
I actually like your original answer better with queuing the returned string and indexes. Then applying them in reverse. This way is simpler, but seems to do more work, having to rescan the entire string for each match. Thanks for the suggestion!
Mike
I added the original suggestion back in. The expected input size would make the difference as to whether rescanning or queueing then replacing would be more effective. I suppose one could also have the replace method queue them, along with the replacement string...
jdmichal
Errr... Misspoke. Obviously queueing is always more effective in regards to CPU time. The difference would be whether it's a big enough problem to worry about.
jdmichal
This has a bug in that you're calling matcher.reset() at the end of each loop iteration. If the replacement string matches the pattern, you'll get into an infinite loop. using appendReplacment() and appendTail() with a StringBuffer would be safer.
Kip
Good catch Kip. I think the only way to correctly implement this using these interfaces is to queue the matches and replace them after all the match operations are complete. I am confused though as to why you think using StringBuffer would help this. Unless you simply meant that it would help performance, as opposed to using the + operator. The real crux is that you cannot replace matches with a lower index without corrupting matches of a higher index. Hence needing to queue them and work through them backwards, or reset the matcher after each replacement.
jdmichal
A: 

Here is the final result of what I did with your suggestion. I thought it would be nice to have out here in case someone has the same problem. The resulting calling code looks like:

content = ReplaceCallback.find(content, regex, new ReplaceCallback.Callback() {
    public String matches(MatchResult match) {
     // Do something special not normally allowed in regex's...
     return "newstring"
    }
});

The entire class listing follows:

import java.util.regex.MatchResult;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.Stack;

/**
 * <p>
 * Class that provides a method for doing regular expression string replacement by passing the matched string to
 * a function that operates on the string.  The result of the operation is then used to replace the original match.
 * </p>
 * <p>Example:</p>
 * <pre>
 * ReplaceCallback.find("string to search on", "/regular(expression/", new ReplaceCallback.Callback() {
 *   public String matches(MatchResult match) {
 *    // query db or whatever...
 *    return match.group().replaceAll("2nd level replacement", "blah blah");
 *   }
 * });
 * </pre>
 * <p>
 * This, in effect, allows for a second level of string regex processing.
 * </p>
 *
 */
public class ReplaceCallback {
    public static interface Callback {
     public String matches(MatchResult match);
    }

    private final Pattern pattern;
    private Callback callback;

    private class Result {
     int start;
     int end;
     String replace;
    }

    /**
     * You probably don't need this.  {@see find(String, String, Callback)}
     * @param regex  The string regex to use
     * @param callback An instance of Callback to execute on matches
     */
    public ReplaceCallback(String regex, final Callback callback) {
     this.pattern = Pattern.compile(regex);
     this.callback = callback;
    }

    public String execute(String string) {
     final Matcher matcher = this.pattern.matcher(string);
     Stack<Result> results = new Stack<Result>();
        while(matcher.find()) {
            final MatchResult matchResult = matcher.toMatchResult();
            Result r = new Result();
      r.replace = callback.matches(matchResult);
      if(r.replace == null)
       continue;
      r.start = matchResult.start();
      r.end = matchResult.end();
      results.push(r);
        }
     // Improve this with a stringbuilder...
     while(!results.empty()) {
      Result r = results.pop();
      string = string.substring(0, r.start) + r.replace + string.substring(r.end);
     }
     return string;
    }

    /**
     * If you wish to reuse the regex multiple times with different callbacks or search strings, you can create a
     * ReplaceCallback directly and use this method to perform the search and replace.
     *
     * @param string The string we are searching through
     * @param callback A callback instance that will be applied to the regex match results.
     * @return The modified search string.
     */
    public String execute(String string, final Callback callback) {
     this.callback = callback;
     return execute(string);
    }

    /**
     * Use this static method to perform your regex search.
     * @param search The string we are searching through
     * @param regex  The regex to apply to the string
     * @param callback A callback instance that will be applied to the regex match results.
     * @return The modified search string.
     */
    public static String find(String search, String regex, Callback callback) {
     ReplaceCallback rc = new ReplaceCallback(regex, callback);
     return rc.execute(search);
    }
}
Mike
I would not use an instance variable to store the callback, but rather pass it as a parameter. Storing it as an instance variable makes your class have unexpected behaviour when called from separate threads at the same time. (The second callback will get matches from the first and second).
jdmichal
+6  A: 

Trying to emulate PHP's callback feature seems an awful lot of work when you could just use appendReplacement() and appendTail() in a loop:

StringBuffer resultString = new StringBuffer();
Pattern regex = Pattern.compile("regex");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
  // You can vary the replacement text for each match on-the-fly
  regexMatcher.appendReplacement(resultString, "replacement");
}
regexMatcher.appendTail(resultString);
Jan Goyvaerts
A: 

I found that jdmichal's answer would infinite loop if your returned string could be matched again; below is a modification which prevents infinite loops from this matching.

public String replaceMatches(String string, Callback callback) {
    String result = "";
    final Matcher matcher = this.pattern.matcher(string);
    int lastMatch = 0;
    while(matcher.find())
    {
        final MatchResult matchResult = matcher.toMatchResult();
        final String replacement = callback.foundMatch(matchResult);
        result += string.substring(lastMatch, matchResult.start()) +
            replacement;
        lastMatch = matchResult.end();
    }
    if (lastMatch < string.length())
        result += string.substring(lastMatch);
    return result;
}
jevon
A: 

I wasn't quite satisfied with any of the solutions here. I wanted a stateless solution. And I didn't want to end up in an infinite loop if my replacement string happened to match the pattern. While I was at it I added support for a limit parameter and a returned count parameter. (I used an AtomicInteger to simulate passing an integer by reference.) I moved the callback parameter to the end of the parameter list, to make it easier to define an anonymous class.

Here is an example of usage:

final Map<String,String> props = new HashMap<String,String>();
props.put("MY_NAME", "Kip");
props.put("DEPT", "R&D");
props.put("BOSS", "Dave");

String subjectString = "Hi my name is ${MY_NAME} and I work in ${DEPT} for ${BOSS}";
String sRegex = "\\$\\{([A-Za-z0-9_]+)\\}";

String replacement = ReplaceCallback.replace(sRegex, subjectString, new ReplaceCallback.Callback() {
  public String matchFound(MatchResult match) {
    String group1 = match.group(1);
    if(group1 != null && props.containsKey(group1))
      return props.get(group1);
    return match.group();
  }
});

System.out.println("replacement: " + replacement);

And here is my version of ReplaceCallback class:

import java.util.concurrent.atomic.AtomicInteger;
import java.util.regex.*;

public class ReplaceCallback
{
  public static interface Callback {
    /**
     * This function is called when a match is made. The string which was matched
     * can be obtained via match.group(), and the individual groupings via
     * match.group(n).
     */
    public String matchFound(MatchResult match);
  }

  /**
   * Replaces with callback, with no limit to the number of replacements.
   * Probably what you want most of the time.
   */
  public static String replace(String pattern, String subject, Callback callback)
  {
    return replace(pattern, subject, -1, null, callback);
  }

  public static String replace(String pattern, String subject, int limit, Callback callback)
  {
    return replace(pattern, subject, limit, null, callback);
  }

  /**
   * @param regex    The regular expression pattern to search on.
   * @param subject  The string to be replaced.
   * @param limit    The maximum number of replacements to make. A negative value
   *                 indicates replace all.
   * @param count    If this is not null, it will be set to the number of
   *                 replacements made.
   * @param callback Callback function
   */
  public static String replace(String regex, String subject, int limit,
          AtomicInteger count, Callback callback)
  {
    StringBuffer sb = new StringBuffer();
    Matcher matcher = Pattern.compile(regex).matcher(subject);
    int i;
    for(i = 0; (limit < 0 || i < limit) && matcher.find(); i++)
    {
      String replacement = callback.matchFound(matcher.toMatchResult());
      replacement = Matcher.quoteReplacement(replacement); //probably what you want...
      matcher.appendReplacement(sb, replacement);
    }
    matcher.appendTail(sb);

    if(count != null)
      count.set(i);
    return sb.toString();
  }
}
Kip