ansaurus

Question

Answer 1

+1 A:

Check out the split command --

  split -- split a file into pieces

  Output  fixed-size  pieces of INPUT to PREFIXaa, PREFIXab, ...; default
  size is 1000 lines, and default PREFIX is `x'.  With no INPUT, or  when
  INPUT is -, read standard input.

Should be much faster, reliable, and cleaner than running awk in a loop!

Steven Schlansker 2010-02-19 07:14:30

Tch. At least point to the right man page :P http://developer.apple.com/Mac/library/documentation/Darwin/Reference/ManPages/man1/split.1.html

Ignacio Vazquez-Abrams 2010-02-19 07:19:49

Split is way too slow.

ACEnglish 2010-02-22 16:05:19

Answer 2

A:

echo "Splitting and Running Script"
# splits to smaller files each 50000 lines, if i understand your problem correctly
awk 'NR%50000==1{++c}{print $0 > "xPart"c".txt"}' file
# or use split -l 50000 
for file in xPart*
do
    python FastQ2Seq.py "$file" &
done
echo "Concatenating"
cat *.out.seq >> original.seq
cat *.out.qul >> original.qul

ghostdog74 2010-02-19 07:27:02

This was really close. I ended up doingawk '{if (NR%500000==1){++c}{print $0 > "xPart"c}}' $1

ACEnglish 2010-02-22 16:06:27

Answer 3

A:

If your seq truly works like the standard seq, you're calling it wrong. The proper command line for seq is:

seq FIRST INCREMENT LAST

So you would need to change your seq commandline to:

seq 0 500000 14000000

R Samuel Klatchko 2010-02-19 07:47:51

ansaurus

tags:

views:

answers:

mac unix script problem

related questions