views:

54

answers:

1

I have a collection of ebooks in different formats (e.g pdf, lit, chm, and other), I would like to extract the first page of each book and have it in plain text. What would be the best language to do so?

A portable language between Linux and XP would be a big plus.

My prime candidates at the moments are Java and Ruby, mostly because they are portable and have a large collection of available components to process different file formats, but are these languages the best choices ?

A: 

C# is also cross platform (mono on linux, .net on windows). For this kind of processing it would work fine. It probably just depends on what language you like best, they're probably all equally capable of doing this type of work.

If you're planning on doing a LOT of this type of processing I would probably recommend going with something more functional; It could probably be faster if you did.

Also C++ might not be a bad choice, parsing some of those files might require getting down to the byte level (not just string parsing) and it is cross platform (with a recompile at least) and very fast.

justin.m.chase