views:

161

answers:

8

Hi All,

Is there any way to do integer incremental replacement only with regex.

Here is the problem, I have text file containing 1 000 000 lines all starting with %

I would like to have replace # by integer incrementally using regex.

input:

% line one

% line two

% line three

...

output:

1 line one

2 line two

3 line three

...
A: 

Depending on your choice of language (you've listed a few) PHP's preg_replace_callback() might be an appropriate function to use

$text = "% First Line\n% Second Line\n% Third Line";

function cb_numbers($matches)
{
    static $c = 1;

    return $c++;
}
$text = preg_replace_callback(
            "/(%)/",
            "cb_numbers",
            $text);

echo $text;
Mark Baker
A: 

in python re.sub accept function as parameter see http://docs.python.org/library/re.html#re.sub

Xavier Combelle
+4  A: 

Here's a way to do it in Python

import re
from itertools import count
s="""
% line one
% line two
% line three"""

def f():
    n=count(1)
    def inner(m):
        return str(next(n))
    return inner

new_s = re.sub("%",f(),s)

alternatively you could use a lambda function in there like so:

new_s = re.sub("%",lambda m,n=count(1):str(next(n)),s)

But it's easy and better to skip regexp altogether

from __future__ import print_function   # For Python<3
import fileinput

f=fileinput.FileInput("file.txt", inplace=1)
for i,line in enumerate(f):
    print ("{0}{1}".format(i, line[1:]), end="")

Since all the lines start with "%" there is no need to even look at that first char

gnibbler
+1 for not using regexp!
Andreas_D
@Andreas_D: Huh, he used regex.
nosklo
@nosklo ... yeah, okaaay, in this context "%" is a regular expression too...
Andreas_D
Ok, I added a (better) alternative using fileinput :)
gnibbler
+5  A: 
n = 1
with open('sourcefile.txt') as input:
    with open('destination.txt', 'w') as output:
        for line in input:
            if line.startswith('%'):
                line = str(n) + line[1:]
                n += 1
            output.write(line)
nosklo
A: 

And a PHP version for good measure:

$input = @fopen('input.txt', 'r');
$output = @fopen("output.txt", "w");

if ($input && $output) {
    $i = 0;
    while (!feof($input)) {
        $line = fgets($input);
        fputs($output, ($line[0] === '%') ?
            substr_replace($line, ++$i, 0, 1) :
            $line
        );
    }
    fclose($input);
    fclose($output);
}

And just because you can, a perl one-liner (yes, with a regex):

perl -i.bak -pe 'BEGIN{$i=1} (s/^%/$i/) && $i++' input.txt
Mike
+3  A: 

Although this problem would best be solved by reading the file line by line and checking the first character with simple string functions, here is how you would do incremental replacement on a string in java:

Pattern p = Pattern.compile("^%");
Matcher m = p.matcher(text);
StringBuffer sb = new StringBuffer();
int i = 0;
while (m.find()) {
    m.appendReplacement(sb, String.valueOf(i++));
}
m.appendTail(sb);

return sb.toString();
Jörn Horstmann
you probably want ++1, not 1++. Line numbers are usually 1-based.
seanizer
...or initialize `i` to one instead of zero.
Alan Moore
A: 

Here's a C# (3.0+) version:

string s = "% line one\n% line two\n% line three";
int n = 1;
s = Regex.Replace(s, @"(?m)^%", m => { return n++.ToString(); });
Console.WriteLine(s);

output:

1 line one
2 line two
3 line three

Of course it requires the whole text to be loaded into memory. If I were doing this for real, I'd probably go with a line-by-line approach.

Alan Moore
A: 
import re, itertools
counter= itertools.count(1)
replacer= lambda match: "%d" % counter.next()
text= re.sub("(?m)^%", replacer, text)

counter is… a counter :). replacer is a function returning the counter values as strings. The "(?m)^%" regex is true for every % at the start of a line (note the multi-line flag).

ΤΖΩΤΖΙΟΥ