tags:

views:

73

answers:

3

I have a requirement to parse a huge text file and send parts of this file to be added as seperate rows in Content Manager. what is the best way of parsing and then update the DB?

I also would need identify certain tokens within this text file.

Please suggest what language should I use to code this requirement.

Thanks

+4  A: 

All widely used programming languages can do that, though scripting languages (especially Perl) may be better suited to the task than others. However, your personal experience is a bigger factor: using the language you're most familiar with would probably be best, unless you have specific reasons not to use it, or to use a different language.

Michael Borgwardt
Upvoted, it's a good answer -- use the software you are familiar with.
High Performance Mark
I agree with Michael. Most languages should be capable of doing this.
James P.
Thanks for your answers, I am a Java programmer, I have not done scripting before, Hence it is going to be bit of learning for me
Raj
+1  A: 

Without more detailed requirements it's difficult to suggest a particular language. Certainly no language is going to magically solve the problem of parsing such a big file. Depending on the format of the file there might be parsing library particularly suited to the job which might guide your choice of language.

If by "Content Manager" you mean Microsoft Content Manager Server I guess one of the Microsoft languages such as C# or VB.Net might be a better choice.

So my answer would pick one of the languages you already know, probably the one you know best.

Dave Webb
A: 

A classic problem when working with large files is just reading them in the first place. A lot of standard libraries tend to want to read the entire file into memory / array. However for really large files this is usually not practical.

For what ever language you end up choosing, look over the file I/O libraries carefully and select a method that will allow you to read in the file in chunks. Then run your parsing logic over the chunks and when getting to the end of a chunk, read in the next. Be careful with the parsing logic, it can sometimes be tricky to handle a chunk when it ends in a place that your parsing is not expecting.

Additionally a double buffer system sometimes works well. Process one chunk and when you get near the end, you fill the other buffer with the next chunk. If your parsing is CPU intensive, you might even look at filling a buffer on another thread to overlap the file I/O with the parsing. However, I wouldn't do this first. Start with just getting the logic working before any performance optimizations.

John Dyer