views:

71

answers:

3

I need to parse a file line by line on given rules.

Here is a requirement.

file can have multiple lines with different data..

01200344545143554145556524341232131
1120034454514355414555652434123213101200344545143554145556524341232131
2120034454514

and rules can be like this.

  • if byte[0,1] == "0" then extract this line to /tmp/record0.dat
  • if byte[0,1] == "1" then extract this line to /tmp/record1.dat
  • if byte[0,1] == "2" then extract this line to /tmp/record2.dat

I am looking for any language which can do this in a fast manner with a very long file size like >2 GB.

Appreciate all the help in advance.

Thanks

+3  A: 

It doesn't appear in your list of tags, but I'd use:

sed -n -e '/^0/w /tmp/record0.dat' \
       -e '/^1/w /tmp/record1.dat' \
       -e '/^2/w /tmp/record2.dat' "$@"

You can also do it in the other languages, but for conciseness and probable correctness, in this case, sed is hard to beat.

Jonathan Leffler
+1 For the probable correct use of the word "probable"
belisarius
+2  A: 

This will work regardless of the value of the first character so it scales without having to add more rules:

awk '{c=substr($0,0,1); print $0 > "/tmp/record" c ".dat"}' inputfile.dat
Dennis Williamson
+1 ... Does '{print > "/tmp/record" substr($0,0,1) ".dat"}' work?
belisarius
@belisarius: yes, it does.
Dennis Williamson
A: 
awk -vFS= 'NF{print $0>"/tmp/record"$1".dat"}' file
ghostdog74