tags:

views:

32

answers:

2

I'm trying to replace numbers of the form 4.2098234e-3 with 00042098234. I can capture the component parts ok with:

(-?)(\d+).(\d)+e-($d+)

but what I don't know how to do is to repeat the zeros at the start $4 times.

Any ideas?

Thanks in advance, Ross

Ideally, I'd like to be able to do this with the find/replace feature of TextMate, if that's of any consequence. I appreciate that there are better tools than RegEx for this problem, but it's still an interesting question (to me).

+3  A: 

You can't do it purely in regular expressions, because the replace string is just a string with backreferences -- you can't use repetition there.

In most programming lnaguages, you have regex replace with callback, which would be able to do it. However it's not something that a text editor can do (unless it has some scripting support).

Mewp
+1  A: 

This isn't something that should be done with regex. That said, you can do something like this, but it's not really worth the effort: the regex is complicated, and the capability is limited.

Here's an illustrative example of replacing a digit [0-9] with that many zeroes.

    // generate the regex and the replacement strings
    String seq = "123456789";
    String regex = seq.replaceAll(".", "(?=[$0-9].*(0)\\$)?") + "\\d";
    String repl = seq.replaceAll(".", "\\$$0");

    // let's see what they look like!!!
    System.out.println(repl); // prints "$1$2$3$4$5$6$7$8$9"
    System.out.println(regex); // prints oh my god just look at the next section!

    // let's see if they work...
    String input = "3 2 0 4 x 11 9";
    System.out.println(
        (input + "0").replaceAll(regex, repl)
    ); // prints "000 00  0000 x 00 000000000"

    // it works!!!

The regex is (as seen on ideone.com) (slightly formatted for readability):

(?=[1-9].*(0)$)?
(?=[2-9].*(0)$)?
(?=[3-9].*(0)$)?
(?=[4-9].*(0)$)?
(?=[5-9].*(0)$)?
(?=[6-9].*(0)$)?
(?=[7-9].*(0)$)?
(?=[8-9].*(0)$)?
(?=[9-9].*(0)$)?
\d

But how does it work??

The regex relies on positive lookaheads. It matches \d, but before doing that, it tries to see if it's [1-9]. If so, \1 goes all the way to the end of the input, where a 0 has been appended, to capture that 0. Then the second assertion checks if it's [2-9], and if so, \2 goes all the way to the end of the input to grab 0, and so on.

The technique works, but beyond a cute regex exercise, it probably has no real practicability.

Note also that 11 is replaced to 00. That is, each 1 is replaced with 1 zero. It's probably possible to recognize 11 as a number and put 11 zeroes instead, but it'd only make the regex more convoluted.

polygenelubricants