tags:

views:

55

answers:

3

Hi, I am new to linux and am trying to parse a bunch of files that looks as follows -

  • Some text
    • start list some other text
      • start sublist1
      • continue sublist1
    • more elements
    • more elements2
      • a sublist2
        • a sub-sublist1

Where all the spaces before the list are tabs. I need a way to parse the text so that a colon is added for sublists... so that it looks like the following at the end:

  • Some text:
    • start list some other text:
      • start sublist1
      • continue sublist1
    • more elements
    • more elements2:
      • a sublist2:
        • a sub-sublist1
    • another element

So colons are added only when there is a sublist available.

I tried looking into sed and awk commands but I am was unable to find anything that stored the status of the previous line to be able to add the colon at the end. It does not have to be done in sed or awk, I have been trying these though and no luck. Any suggestions at all would help.

Any help would be very much appreciated,

Thanks in advance, Jack

+1  A: 

Somthing like that sould solve your problem:

awk '
    function countTabs(line) {
        tabs=0;
        i=0;
        while( substr(line,i++,1) == "\t")
            tabs++;
        return tabs;
     }
{
    line1 = $0;
    while( getline line2) {
        if ( countTabs(line1) < countTabs(line2))
           printf("%s:\n" , line1);
        else
           printf("%s\n",line1);
        line1 = line2;
    }
    print line2;
}'
stacker
I had to change the quotes around the tab to double quotes to make it work for me: `"\t"` but +1 for not using an array (-1/2 for being tab-specific instead of any-white-space).
Dennis Williamson
I fixed the qoute issu, thanks. The problem with white space is that also a tabsize is required to calc the indentation. The question was about how to keep the previous line.
stacker
Thank you very much, that was very helpful. :)
Jack_lui
+1  A: 

something to try

awk '
{
    A[d++]=$0
    match($0,"[^[:blank:]]")
    if ( RSTART > t ){    A[d-1]=A[d-1]":"  }
    else{  gsub(/:$/,"",A[d-2])  }
    t=RSTART
}
END{
    for(i=0;i<=d;i++){
        print A[i]
    }
} ' file

output

$ cat file
Some text
        start list some other text
                start sublist1
                continue sublist1
        more elements
        more elements2
                a sublist2
                        a sub-sublist1
                                a sub-sublist2
        another element

$ ./shell.sh
Some text:
        start list some other text:
                start sublist1
                continue sublist1
        more elements
        more elements2
                a sublist2:
                        a sub-sublist1:
                                a sub-sublist2
        another element
ghostdog74
"more elements2" should have a colon after it, but it's not getting one.
Dennis Williamson
+1 for `match` and `RSTART` (-1/2 for using an array)
Dennis Williamson
A: 

This modified version of ghostdog74's script should get the job done:

awk '
{
    A[NR]=$0
    match($0,"[^[:blank:]]")
    if ( RSTART > t ){ A[NR-1]=A[NR-1]":" }
    t=RSTART
}
END{
    for(i=1; i<=NR+1; i++){
        print A[i]
    }
} ' file
Dennis Williamson