ansaurus

Question

Replacing variable numbers of items... regex?

Answer 1

+2 A:

Yes, but it may be a bit of a hack, and you'll have to be careful it doesn't overmatch!

Regex:

(?:\{sup\s)?(\d)(?=\d*})}?

Replacement String:

{sup $1}

A short explanation:

(?:                            | start non-capturing group 1
  \{                           |   match the character '{'
  sup                          |   match the substring: "sup"
  \s                           |   match any white space character
)                              | end non-capturing group 1
?                              | ...and repeat it once or not at all
(                              | start group 1
  \d                           |   match any character in the range 0..9
)                              | end group 1
(?=                            | start positive look ahead
  \d                           |   match any character in the range 0..9
  *                            |   ...and repeat it zero or more times
  }                            |   match the substring: "}"
)                              | stop negative look ahead
}                              | match the substring: "}"
?                              | ...and repeat it once or not at all

In plain English: it matches a single digit, only when looking ahead there's a } with optional digits in between. If possible, the substrings {sup and } are also replaced.

EDIT:

A better one is this:

(?:\{sup\s|\G)(\d)(?=\d*})}?

That way, digits like in the string "set={123}" won't be replaced. The \G in my second regex matches the spot where the previous match ended.

Bart Kiers 2009-12-16 14:44:20

Why did you mark the `{sup ` part as optional? It looks like it will match "1}".

Mike D. 2009-12-16 14:48:02

@Mike: the OP wants to replace `{sup 123}` with `{sup 1}{sup 2}{sup 3}`. Only the first digit has `{sup ` in front of it and the last digit has `}` after it: that's why it's optional.

Bart Kiers 2009-12-16 14:51:20

@Mike: ah, I see what you mean. Hence my remark "you'll have to be careful it doesn't *overmatch*!". See my second solution, the one with the `\G` in it, which accounts for that.

Bart Kiers 2009-12-16 15:00:57

That second edited one is the right one. The first one incorrectly makes replacements on other inputs like {sub 1} instead of {sup 1}. There are a lot of replacements in these documents.

darelf 2009-12-16 15:10:09

You're in luck then: the `\G` is not implemented in many regex implementations (I only know of Java).

Bart Kiers 2009-12-16 15:13:25

@Bart K: You prolly already know this, but you are a genius.

darelf 2009-12-16 15:18:40

`\G` is not that rare, really: http://www.regular-expressions.info/continue.html . It's just that, outside of Perl (where it originated--of course!), people don't seem to think of it very often. At least, I don't; this isn't the first time you've managed to blindside me with it. :)

Alan Moore 2009-12-18 04:42:51

Aha, I always thought it originated in Java's java.util.regex (don't know where I got that idea from...) and that Perl either adopted it from Java, or was going to do so. Thanks for the info.

Bart Kiers 2009-12-18 07:33:35

Answer 2

A:

Sure, this is a standard Regular Expression construct. You can find out about all the metacharacters in the Pattern Javadoc, but for your purposes, you probably want the "+" metacharacter, or the {1,3} greedy quantifier. Details in the link.

Adrian Petrescu 2009-12-16 14:44:28

No, you misunderstood, the OP is not looking how to match one or more digits.

Bart Kiers 2009-12-16 14:52:49

Answer 3

+1 A:

The easiest way to do this kind of thing is with something like PHP's preg_replace_callback or .NET's MatchEvaluator delegates. Java doesn't have anything like that built in, but it does expose the lower-level API that lets you implement it yourself. Here's one way to do it:

import java.util.regex.*;

public class Test
{
  static String sepsup(String orig)
  {
    Pattern p = Pattern.compile("(\\{su[bp] )(\\d+)\\}");
    Matcher m = p.matcher(orig);
    StringBuffer sb = new StringBuffer();
    while (m.find())
    {
      m.appendReplacement(sb, "");
      for (char ch : m.group(2).toCharArray())
      {
        sb.append(m.group(1)).append(ch).append("}");
      }
    }
    m.appendTail(sb);
    return sb.toString();
  }

  public static void main (String[] args)
  {
    String s = "{sup 19}F({sup 3}He,t){sub 19}Ne(p){sup 18}F";
    System.out.println(s);
    System.out.println(sepsup(s));
  }
}

result:

{sup 19}F({sup 3}He,t){sub 19}Ne(p){sup 18}F
{sup 1}{sup 9}F({sup 3}He,t){sub 1}{sub 9}Ne(p){sup 1}{sup 8}F

If you wanted, you could go ahead and generate the superscript and subscript characters and insert those instead.

Alan Moore 2009-12-16 15:27:26

Nice one, Alan!

Bart Kiers 2009-12-17 09:26:06

ansaurus

tags:

views:

answers:

Replacing variable numbers of items... regex?

related questions