views:

339

answers:

5

What do people mean when they say "Perl is very good at parsing"?

How is Perl any better or more powerful than other scripting languages such as Python or Ruby?

+1  A: 

Perl is very good in text parsing, when compared to C/C++/Java.

Igor Oks
Igor should probably expand his answer to note that when Perl came along, text processing wasn't a trivial task. 20 years later, people don't appreciate that pain now that everything has PCRE, etc.
brian d foy
+18  A: 

They mean that Perl was originally designed for processing text files and has many features that make it easy:

  • Perl has many functions for string processing: substr, index, chomp, length, grep, sort, reverse, lc, ucfirst, ...
  • Perl automatically converts between numbers and strings depending on how a value is used. (e.g. you can read the character string '100' from a file and add one to it without needing to do an string to integer conversion first)
  • Perl automatically handles conversion to and from the platform encoding (e.g. CRLF on Windows) and a logical newline ("\n") within your program.
  • Regular expressions are integrated into the syntax instead of being a separate library.
  • Perl's regular expressions are the "gold standard" for power and functionality.
  • Perl has full Unicode support.

Python and Ruby also have good facilities for text processing. (Ruby in particular took much inspiration from Perl, much as Perl has shamelessly borrowed from many other languages.) There's little point in asking which is better. Use what you like.

Michael Carman
Although some people from on $_, I think it belongs on that list. The idea that you have a "current topic" or thing that you're working on and applying various steps to it is very nice.
brian d foy
I wouldn't say that Perl automatically handles line endings. I think you're confusing that with writing to a text file in Windows. Reading data coming back doesn't do anything special unless you tell Perl what to do.
brian d foy
@brian: Conversion between the platform newline sequence and a logical "\n" happens on both reading and writing (ignoring `binmode`, of course). I know that you're well aware of this so I find your comment confusing. I suppose I could have said that "Perl lets you think in terms of logical newlines instead of worrying about whatever sequence your OS uses" without mentioning how it does that.
Michael Carman
@Michael: you're confusing the behavior of what a DOSish perl does and what the rest of the world does. Reading a file with Windows line endings on a unix machine still gives you Windows line endings. It's only a special feature of Perl on Windows and when Perl knows its writing to a tty. The issue of what "\n" is is an entirely different matter. See perlport for the details.
brian d foy
+3  A: 

Perl is good for ETL or batch processing motions as well. It's a minimal amount of code to pick up the file; push it through split to get a map, perform some logical business actions on the record, and write it back out to disk.

I suppose that's more data processing then data parsing, but data processing is bulk data parsing.

coffeepac
+11  A: 

Don't take a statement of Perl's strengths to be a statement of another language's failings. Perl is good for text processing, but that doesn't mean Ruby or Python suck.

When people talk about Perl being "good for parsing", they're mainly echoing Perl's history; it was invented in the day when heavy-duty text processing wasn't easy. Try doing some of that in C or C++ (Java hadn't been invented yet, either!). Back in the day, Larry was trying to do his work with sed and awk, but running into their limitations. He made a tool that made text even easier to work with.

Perl is still very good for text manipulation tasks, but now so are a lot of other languages.

brian d foy
A: 

It's probably because people are used to what it was built for, as described in the perl documentation, so it has become commonplace for many people to associate parsing of text files with Perl. Not to exclude Ruby or Python, it's just more of a household name IMHO.

Perl is a language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. It's also a good language for many system management tasks. The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal).

0A0D