views:

458

answers:

1

Hey Folks ! I need to parse some known file formats, one of them is the CUSCAR format, i strongly believe that RegEx will do the job ,any suggestions ?

+2  A: 

I just looked at the CUSCAR spec, and I think you'll get some pretty ugly regex code to parse that. You could get away with it, if you are parsing only part of it. You'll have to test for speed, as your main bottleneck will be I/O.

I did something similar with the vendor files that came from QWEST. These beasties were hierarchical text files. Parsing those sucked! I'm currently creating and parsing text files between 4 to 50 million lines each (every day).

There is a nice framework called FileHelpers Library. This framework will help you create object-oriented representation of the records (text lines). It even has a nice wizard to walk you through the creation of these objects representing the records. It will handle master-detail, delimited, and fixed formats easily.

hectorsosajr