tags:

views:

47

answers:

4

I want to make two regex replacements on one string, i.e. in Java terms

myString.replaceAll(pattern1, replacement1).replaceAll(pattern2, replacement2);

However, suppose that myString may be very long, so it would be desirable to avoid doing two passes over it. Can this be done in single pass?

String pattern = ...;
String replacement = ...;
myString.replaceAll(pattern, replacement);

The obvious candidate for pattern is pattern1 + "|" + pattern2, but then I can't see how to write replacement.

To simplify, let's assume that matches of pattern1 and pattern2 can't intersect, and that replacement1 doesn't introduce any new matches of pattern2.

+1  A: 

Depending on the fact, which programming language you use, you can use a callback as replacement and use a RegExp which matches on both things and then you'll have an if statement in the callback function which checks with regexp matched and replace it with the appropriate replacement.

levu
Voted up, but not available in my case.
Alexey Romanov
A: 

As @levu said, it depends on the programming language. In Ruby, you could do something like this:

ree-1.8.7-2010.02 > s
 => "hello world" 
ree-1.8.7-2010.02 > s.gsub(/(hello|world)/) {|match| match == 'hello' ? 'hi there!' : 'universe!' }
 => "hi there! universe!" 

When gsub sees either "hello" or "world" in the String s, it sends that value to the block in the variable called match. The block replaces "hello" with "hi there!", and replaces any other value with "universe!"

davidkovsky
+1  A: 

The easiest way to do this is to maintain a hash that maps strings to their replacements. Compose a match that matches any of the keys, and pass the matched key to the replacement portion. Have the replacement portion than pull in the value.

In Perl, that would merely be:

my @keys = keys %hash;
my $alt = '\b(' . join("|", @keys) . ')\b';
s/($alt)/$hash{$1}/g;

The equivalent Java solution would — like everything — be significantly longer, but the same approach would work.

Ordering issue do arise if one string is a starting substring of another.

tchrist
+1  A: 

Unlike virtually every other regex flavor out there, Java's doesn't support callbacks. However, it does expose some lower-level API calls that let you implement them yourself. Here's a post that shows how to do it:

http://stackoverflow.com/questions/1277990/java-regex-replace-with-capturing-group/1282099#1282099

As you indicated, you would need to combine the patterns into one, but you also need to isolate each pattern in its own capturing group, like so:

String bigPattern = "(" + pattern1 + ")|(" + pattern2 + ")";

(If the patterns already contain capturing groups, this will change the numbering scheme; you'll have to adjust any backreferences. I'll assume there are no capturing groups except the ones you just created.)

Then, within the replacement() method, you determine which group actually matched, and choose the replacement text accordingly:

public String replacement()
{
  if (group(1) != null)
  {
    return replacement1;
  }
  else if (group(2) != null)
  {
    return replacement2;
  }
}
Alan Moore