What do people mean when they say "Perl is very good at parsing"?
How is Perl any better or more powerful than other scripting languages such as Python or Ruby?
What do people mean when they say "Perl is very good at parsing"?
How is Perl any better or more powerful than other scripting languages such as Python or Ruby?
They mean that Perl was originally designed for processing text files and has many features that make it easy:
substr
, index
, chomp
, length
, grep
, sort
, reverse
, lc
, ucfirst
, ...Python and Ruby also have good facilities for text processing. (Ruby in particular took much inspiration from Perl, much as Perl has shamelessly borrowed from many other languages.) There's little point in asking which is better. Use what you like.
Perl is good for ETL or batch processing motions as well. It's a minimal amount of code to pick up the file; push it through split
to get a map
, perform some logical business actions on the record, and write it back out to disk.
I suppose that's more data processing then data parsing, but data processing is bulk data parsing.
Don't take a statement of Perl's strengths to be a statement of another language's failings. Perl is good for text processing, but that doesn't mean Ruby or Python suck.
When people talk about Perl being "good for parsing", they're mainly echoing Perl's history; it was invented in the day when heavy-duty text processing wasn't easy. Try doing some of that in C or C++ (Java hadn't been invented yet, either!). Back in the day, Larry was trying to do his work with sed and awk, but running into their limitations. He made a tool that made text even easier to work with.
Perl is still very good for text manipulation tasks, but now so are a lot of other languages.
It's probably because people are used to what it was built for, as described in the perl documentation, so it has become commonplace for many people to associate parsing of text files with Perl. Not to exclude Ruby or Python, it's just more of a household name IMHO.
Perl is a language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. It's also a good language for many system management tasks. The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal).