views:

38

answers:

2

Hi, actualy i have a very complex problem, but i have narrowed it down here to the most essential part with some dummy-data.

Say i have the folowing text:

a
aa
aaa
aaaa
aaaa
aaaaa
a
aa
aaa
aaaa
aaaaa
aaaaaa
aaaa
a

What i would like to do is, FOR EXAMPLE when a line of 4 a's is followed by a line of 1 a. I'd like to add a line of 3 a's after the line of 4, and add a line of 2 a's after the line of 3. So the result would be this:

a
aa
aaa
aaaa
aaaa
aaaaa
aaaa
aaa
aa
a
aa
aaa
aaaa
aaaaa
aaaaaa
aaaaa
aaaa
aaa
aa
a

I have tried the folowing regex in editpad pro:

find: \r?\n(a*)aa\r?\n\1\r?\n
repl: \n\1aa\n\1a\n\1\n

But this only works when the next line has exactly 2 a's less than the previous one.. I know I could write a bunch of regular expressions like the one above, to work for difference of 2 a's, 3 a's, 4 a's, 5 a's and so on. But i'd like to have only one regex. I don't mind if i would have to run that regex multiple times though..

Any thoughts?

A: 

Just found a solution myself. Seems like I was very close, just overdid it a bit with the line breaks in the beginning.

find: (a*)aa\r?\n\1\r?\n
repl: \1aa\n\1a\n\1\n

This works after i repeatedly klick 'replace all' in editpad pro. I would like to have a solution where i need to run the replace all only once, so if there's any further thoughts, please let me know

Jules
A: 

If you save your dummy data in a file called file, save the following gawk(1) program as a file called runme and invoke it from the shell as gawk -f runme file, it should result in your desired output.

Note that the program prints newly produced lines as a series of hashes instead of a's in order to illustrate the additions.

BEGIN { }

{
    if (NR==1) { print $0; oldrec = $0; }

    if (NR>1) {
            levelsdiff = length(oldrec) - length($0);

            if (levelsdiff>1) { 
                    newrecs = levelsdiff - 1;
                    i = 1;
                    while (newrecs>0) {
                            newline = "";
                            hashes = length(oldrec) - i;
                            while (hashes!=0) {
                                    newline = newline "#";
                                    hashes--;
                            }
                            print newline;
                            i++; newrecs--;
                    }
            }

            if (levelsdiff<1) { 
                    newrecs = -levelsdiff - 1;
                    i = 1;
                    while (newrecs>0) {
                            newline = "";
                            hashes = length(oldrec) + i;
                            while (hashes!=0) {
                                    newline = newline "#";
                                    hashes--;
                            }
                            print newline;
                            i++; newrecs--;
                    }
            }

            print $0;
            oldrec = $0;
    }
}

END { }

Outputs:

a
aa
aaa
aaaa
aaaa
aaaaa
####
###
##
a
aa
aaa
aaaa
aaaaa
aaaaaa
#####
aaaa
###
##
a
Xhantar
that's great, but doesn't involve regex. I'm looking for a regular expression to use in editpad pro
Jules