tags:

views:

370

answers:

4

I got a file that has a line in the file like this:

check=('78905905f5a4ed82160c327f3fd34cba')

I'd like to be able to move this line to follow a line that looks like this:

files=('somefile.txt')

The array though at times that can span multiple lines, for example:

files=('somefile.txt'
       'file2.png'
       'another.txt'
       'andanother...')

text
in between

check=('78905905f5a4ed82160c327f3fd34cba'
       '5277a9164001a4276837b59dade26af2'
       '3f8b60b6fbb993c18442b62ea661aa6b')

The array/line always ends in a ) and no text in between will contain a closed parenthesis.

I got some advice that awk can do this:

awk '/files/{
    f=0
    print $0
    for(i=1;i<=d;i++){ print a[i]  }
    g=0
    delete a # remove array after found
    next
}
/check/{ f=1; g=1 }
f{ a[++d]=$0 }
!g' file

This will only span one line though. I was told to expand the search:

awk '/source/ && /\)$/{
    f=0
    print $0
    for(i=1;i<=d;i++){ print a[i]  }
    g=0
    delete a # remove array after found
    next
}
/md5sum/ && /\)$/{ f=1; g=1 }
f{ a[++d]=$0 }
!g'

Just learning awk so I'd appreciate help with this. Or if there is another tool that can do this, I'd like to hear about it. Someone told me that 'ed' these types of capabilities.

+2  A: 

To answer your last question first, yes, awk is the typical Unix tool for this, other candidates are the incredibly powerful Perl, Python, or .. my favorite .. Ruby. One advantage of awk is that it's always there; it's part of the base system. Another way to solve this kind of problem is with an editor script that controls ed(1) or ex(1).

Ok, new program for the revised question. This program will move the "check" lines either up or down as necessary so that they follow the "files" lines.

BEGIN {
  checkAt = 0
  filesAt = 0
  scanning = 0
}

/check=\(/ {
  checkAt = NR
  scanning = 1
}

/files=\(/ {
  filesAt = NR
  scanning = 1
}

/)$/ {
  if (scanning) {
    if (checkAt > filesAt) {
      checkEnd = NR
    } else {
      filesEnd = NR
    }
    scanning = 0
  }
}

{
  lines[NR] = $0
}

END {
  for (i = 1; i <= NR; ++i) {
    if (checkAt <= i && i <= checkEnd) {
      continue
    }
    print lines[i]
    if (i == filesEnd) {
      for (j = checkAt; j <= checkEnd; ++j) {
        print lines[j]
      }
    }
  }
}
DigitalRoss
Hey, thats great, however the closing parenthesis is being truncated. i.e check=(.... . The example I tried on the files array was at the end of the file. Does that made a difference? Also, could this be made to work too if the files array is before the check array :). I'm finding it differs in some files.
Todd Partridge
Ok, if you add this line at the end of `mover.awk` it will deal with the case where the last thing in the file is a check() line: `END { for (v in saved) { print saved[v] } }` *however* I cannot reproduce your ) truncation bug report. Can you please drop a test case onto http://pastie.org (use file type "plain text")??
DigitalRoss
I've put an updated version of the script in http://pastie.org/662905 This version deals with inverted ordering by ouputing the last check if it sees a new one, and outputing any left-over one at EOF. But I still need a test case because I can't reproduce your bug.
DigitalRoss
Ah thanks for the updated script. I had made a mistake on my original example above where I put 'check' before 'files', dang. I think this is why I'm having problems. I put an example template of the file type I'd like to be able to move lines on here: http://pastie.org/667764. Also I tested your new (pastie) script (before reversing order of check and files in my template to test) and still got the truncated end parenthesis - http://pastie.org/667766.
Todd Partridge
btw, I fixed the error above
Todd Partridge
Todd, you pasted an example of failed *output*, not an example of the *input* that didn't work. The point of asking for a sample was so I could reproduce the problem, and while pasting bad output might lead to an "aha!" moment, it's useless for reproducing a problem. Anyway, forget it for now, this is a totally new program that can move lines up in the file as well as down. But if this new one doesn't always work for you *please post an example of bug-triggering "bad" input*, so that a problem can be *duplicated* or *reproduced*.
DigitalRoss
The second pastie is actually the output but is truncated - won't do it again ;). DigitalRoss, thanks for you time and patience. Still learning awk so I appreciate it. I tested your new script and it... works!... even in test cases where the file array and check array are reversed. Thank you, thank you. This will help me quite a bit. Marking done.
Todd Partridge
A: 
Beta
Wow trying to do with sed, a brave man :P. Yeah, i tried this with sed but understanding registers I haven't got to far into yet. From your command it looks like bash is trying to interpret the parenthesis. I tried escaping them but am getting: sed: -e expression #1, char 0: unmatched `{'bash: d}: command not foundbash: s/n//}}: No such file or directory
Todd Partridge
Still no luck. Using gnu-sed 4.2.1 here. bash: syntax error near unexpected token `(
Todd Partridge
*sigh* If you're interested we could do some experiments and get it working, but since you already have a working solution in awk it would just be an exersize in learning sed.
Beta
A: 

I looked in to doing this with Awk, but it looked like you wouldn't really get anything clever out of it, it would just be the same logic, but with some Awk pain to go with it, so I did it in Perl :)

#!/usr/bin/perl

open(IN, $ARGV[0]) || die("Could not open file: " . $ARGV[0]);

my $buffer="";

foreach $line (<IN>) {
        if ($line =~ /^check=/) {
                $flag = 1;
                $buffer .= $line;
        } elsif ($flag == 1 && $line =~/\)/) {
                $flag = 0;
                $buffer .= $line;
        } elsif ($flag == 1) {
                $buffer .= $line;
        } elsif ($flag == 0 && $line =~ /^files=/) {
                $flag = 2;
                print $line;
        } elsif ($flag == 2 && $line =~ /\)/) {
                $flag = 0;
                print $line;
                if (length($buffer) > 0) {
                        print $buffer;
                        $buffer = "";
                }
        } else {
                print $line;
        }

}

And the output :)

Chill:~ rus$ cat test check=('78905905f5a4ed82160c327f3fd34cba'
       '5277a9164001a4276837b59dade26af2'
       '3f8b60b6fbb993c18442b62ea661aa6b')

text in between

files=('somefile.txt'
       'file2.png'
       'another.txt'
       'andanother...')

asdasdasd

check=('78905905f5a4ed82160c327f3fd34cba'
       '5277a9164001a4276837b59dade26af2'
       '3f8b60b6fbb993c18442b62ea661aa6b')

text in between

files=('somefile.txt'
       'file2.png'
       'another.txt'
       'andanother...')

asdsd

check=('78905905f5a4ed82160c327f3fd34cba'
       '5277a9164001a4276837b59dade26af2'
       '3f8b60b6fbb993c18442b62ea661aa6b')

text in between

files=('somefile.txt'
       'file2.png'
       'another.txt'
       'andanother...')

Chill:~ rus$ ./t.pl test

text in between

files=('somefile.txt'
       'file2.png'
       'another.txt'
       'andanother...') check=('78905905f5a4ed82160c327f3fd34cba'
       '5277a9164001a4276837b59dade26af2'
       '3f8b60b6fbb993c18442b62ea661aa6b')

asdasdasd


text in between

files=('somefile.txt'
       'file2.png'
       'another.txt'
       'andanother...') check=('78905905f5a4ed82160c327f3fd34cba'
       '5277a9164001a4276837b59dade26af2'
       '3f8b60b6fbb993c18442b62ea661aa6b')

asdsd


text in between

files=('somefile.txt'
       'file2.png'
       'another.txt'
       'andanother...') check=('78905905f5a4ed82160c327f3fd34cba'
       '5277a9164001a4276837b59dade26af2'
       '3f8b60b6fbb993c18442b62ea661aa6b')

ta da ?! :D

idimmu
urgh, the output paste is screwed up, but trust me, it does work. im jealous of the awk and sed solutions though :)
idimmu
nah, this is good. Doesn't work for me though. The files array is getting erased and the file array still exists. I got { and () chracters between the two arrays, does this make a difference?
Todd Partridge
i added loads of {()} chars in to my test data and it still worked fine! do you have an example of the test data i can try it on?
idimmu
A: 

@todd, I seem to have left you in the lurch after providing you the awk solution haven't i. ? :). here's another method, this time not using method of flags. there are some loose ends (hint: check the patterns p,q and output again) that i leave it to you to tidy up.

gawk 'BEGIN{
    RS="check=[(]"
    q="files=(.*\047)"  # pattern to replace files= part
    p=".*(files=(.*\047)).*" # to get the whole files= part to variable
}
NR>1{
    b=gensub(p, "\\1","g",$0) # get the files=part to var b
    printf "%s\n\n",b    
    printf "check=("
    gsub(q,"",$0)
    print $0
}' file

NB: gensub is specific to gawk so if you have gawk, then that's alright

output

$ more file
check=('5277a9164001a4276837b59dade26af2'
       '5277a9164001a4276837b59dade26af2'
       '3f8b60b6fbb993c18442b62ea661aa6b')

text in between one

files=('somefile1.txt'
       'file1.png'    
       'another1.txt' 
       'andanother1...')

asdasdasd blah blah

check=('78905905f5a4ed82160c327f3fd34cba'
       '5277a9164001a4276837b59dade26af2'
       '3f8b60b6fbb993c18442b62ea661aa6b')

text in between  two

files=('somefile2.txt'
       'file2.png'    
       'another2.txt' 
       'andanother2...')

asdsd blaasdf aslasdfaslj aslfjsldfsa 123e12

check=('78905905fblah blah5a4ed82160c327f3fd34cba'
       '5277a9164001a4276837b59dade26af2'         
       '3f8b60b6fbb993c18442b62ea661aa6b')        

text in between

files=('somefile3.txt'
       'file3.png'    
       'another3.txt' 
       'andanother3...')

$ ./shell.sh
files=('somefile1.txt'             
       'file1.png'                 
       'another1.txt'              
       'andanother1...'            

check=('5277a9164001a4276837b59dade26af2'
       '5277a9164001a4276837b59dade26af2'
       '3f8b60b6fbb993c18442b62ea661aa6b')

text in between one

)

asdasdasd blah blah


files=('somefile2.txt'
       'file2.png'
       'another2.txt'
       'andanother2...'

check=('78905905f5a4ed82160c327f3fd34cba'
       '5277a9164001a4276837b59dade26af2'
       '3f8b60b6fbb993c18442b62ea661aa6b')

text in between  two

)

asdsd blaasdf aslasdfaslj aslfjsldfsa 123e12


files=('somefile3.txt'
       'file3.png'
       'another3.txt'
       'andanother3...'

check=('78905905fblah blah5a4ed82160c327f3fd34cba'
       '5277a9164001a4276837b59dade26af2'
       '3f8b60b6fbb993c18442b62ea661aa6b')

text in between

)
ghostdog74
Thanks ghost. Been cramming in awk that last few days and just don't get it yet. Still learning sed. Guess I'm the type that like to learn on thing and learn it well before moving on :D. Appreciate the help though, much appreciated.
Todd Partridge