views:

818

answers:

6

I have this web application which needs to do several heavy text processing tasks: removing certain characters, parsing XML files, among others. Some of them involve regular expressions.

The web application has some implementations in Java and others in PHP. Is it worth using Perl or other specific text processing language for such tasks, or is there really no difference with using PHP?

I even thought of using Sed, Awk maybe even some compiled C scripts for processing texts. There's a lot of text to be processed...

+16  A: 

Yes, Perl is a good option. As a language, it's definitely more suitable for those kinds of tasks than Java or PHP. If you have the Perl knowledge, I would recommend it for this kind of task.

Leon Timmermans
I agree, but I'll add that PHP isn't terrible for the job either. Depending on the environment and your proficiency, PHP could be a good choice.
troelskn
PHP is a bit weak at regexps IMO, specially because quoting is confusing (just like in Java).
Leon Timmermans
+3  A: 

Text processing is exactly what Perl was created for. After all it's Practical Extraction and Report Language. On the other hand, for web application I'd prefer Python.

vartec
That's a backronym, actually...
Hasturkun
Everyone knows Perl stands for Pathologically Eclectic Rubbish Lister!
Rob K
+3  A: 

Yes, Perl was designed with processing text in mind.

It has tons of useful text processing features, and it was the first language I used (long ago) that had regular expressions.

http://en.wikipedia.org/wiki/Perl

Dana Holt
+6  A: 

Perl is THE language for text processsing. It was designed with this in mind.

2-bits
+2  A: 

Yes. Text processing is PERL's #1 strong point. Since you will integrate into your existing app, you'll need to execute an external program so think about how to run it securely and perhaps as a background process (to avoid start up delays in your real time web app.)

Chris Nava
http://search.cpan.org/perldoc?Inline::PERL
jettero
I don't see how this would allow Perl inline in another language... Seems that it just inlines perl in perl.
Chris Nava
jettero was poking fun at the usage of 'PERL' since, outside of the Inline::PERL module, there is no such thing.
Mr. Muskrat
Whew. Thought I was missing something. ;-)
Chris Nava
+9  A: 

I too suggest you use Perl, it's made for text crunching.

However, if you are going to parse/process XML, please don't try to roll your own solution, there are several high quality modules that do the job correctly. As a starter, I recommend you take a look at XML::Twig

Also, for regular expressions, there are dozens of already-made ones under the Regexp::Common distribution. Most probably you'll find what you need there and it will save you time.

brunov