ansaurus

Question

Splitting a file and its lines under Linux/bash

Answer 1

+1 A:

Homework? :-)

I would think that a simple pipe with sed (to split each line into two) and split (to split things up into multiple files) would be enough.

The man command is your friend.

Added after confirmation that it is not homework:

How about

sed 's/\(.....\)\(.....\)/\1\n\2/' input_file | split -l 2000000 - out-prefix-

?

HD 2008-09-15 15:28:24

Not homework, just testing the site. Thanks for your answer.

Sklivvz 2008-09-15 15:33:08

Great! In the end I used this:for file in *.txt; do echo $file; sed 's/$.....$$.....$/\1\r\n\2/' $file | split -l 2000000 - $file.part.; done

Sklivvz 2008-09-15 18:11:49

Answer 2

A:

I think that something like this could work:

out_file=1
out_pairs=0
cat $in_file | while read line; do
    if [ $out_pairs -gt 1000000 ]; then
        out_file=$(($out_file + 1))
        out_pairs=0
    fi
    echo "${line%?????}" >> out${out_file}
    echo "${line#?????}" >> out${out_file}
    out_pairs=$(($out_pairs + 1))
done

Not sure if it's simpler or more efficient than using Perl, though.

che 2008-09-15 15:31:31

Answer 3

A:

First five chars of each line variant, assuming that the large file called x.txt, and assuming it's OK to create files in the current directory with names x.txt.* :

split -l 2000000 x.txt x.txt.out && (for splitfile in x.txt.out*; do outfile="${splitfile}.firstfive"; echo "$splitfile -> $outfile"; cut -c 1-5 "$splitfile" > "$outfile"; done)

Troels Arvin 2008-09-15 15:35:48

ansaurus

tags:

views:

answers:

Splitting a file and its lines under Linux/bash

related questions