views:

145

answers:

4

I have a text file with a marker somewhere in the middle:

one
two
three
blah-blah *MARKER* blah-blah
four
five
six
...

I just need to split this file in two files, first containing everything before MARKER, and second one containing everything after MARKER. It seems it can be done in one line with awk or sed, I just can't figure out how.

I tried the easy way — using csplit, but csplit doesn't play well with Unicode text.

A: 

Try this:

awk '/MARKER/{n++}{print >"out" n ".txt" }' final.txt

It will read input from final.txt and produces out1.txt, out2.txt, etc...

Leniel Macaferi
Almost worked. Doesn't screw up UTF-8, but leaves *MARKER* in the second file.
Sergey Kovalev
Have you tried the solution shown here: http://www.unix.com/shell-programming-scripting/41060-split-file-into-seperate-files.html - It uses `csplit` and works the way you want, that is, letting the marker out the files.
Leniel Macaferi
+1  A: 
sed -n '/MARKER/q;p' inputfile > outputfile1
sed -n '/MARKER/{:a;n;p;ba}' inputfile > outputfile2

Or all in one:

sed -n -e '/MARKER/! w outputfile1' -e'/MARKER/{:a;n;w outputfile2' -e 'ba}' inputfile
Dennis Williamson
A: 

The split command will almost do what you want:

$ split -p '\*MARKER\*' splitee 
$ cat xaa
one
two
three
$ cat xab
blah-blah *MARKER* blah-blah
four
five
six
$ tail -n+2 xab
four
five
six

Perhaps it's close enough for your needs.

I have no idea if it does any better with Unicode than csplit, though.

Marcelo Cantos
A: 

Your file consists of three regions. The region before the marker (0), the region/line containing the marker (1) and the region following the marker (2). This little script does the job producing two text files (file1, file2) and removing the marker. Quick and dirty.

MARKER='<-->'                                        # change to your marker
region=0                                                   
while read line
    do 
        [[ $region == 1 ]] && region=2

        if [[ -n $(echo $line | grep "$MARKER") ]]   # line containing marker
        then 
            region=1                                  
            echo $line | sed s/"$MARKER".*//g >> file1
            echo $line | sed s/.*"$MARKER"//g >> file2
        fi

        [[ $region == 0 ]] && echo $line >> file1     # lines before marker
        [[ $region == 2 ]] && echo $line >> file2     # lines following marker

    done < inputfile                                  # change to your input file

And the obligatory one-liner:

MARKER='<-->'; region=0; while read line; do [[ $region == 1 ]] && region=2; if [[ -n $(echo $line | grep "$MARKER") ]]; then region=1; echo $line | sed s/"$MARKER".*//g >> file1; echo $line | sed s/.*"$MARKER"//g >> file2; fi; [[ $region == 0 ]] && echo $line >> file1; [[ $region == 2 ]] && echo $line >> file2; done < inputfile
lecodesportif