views:

672

answers:

11

I've heard that Perl is the go-to language for string manipulation (and line noise ;). Can someone provide examples and comparisons with other language(s) to show me why?

+4  A: 

It's a very subjective question. Perhaps the true answer is that Perl has a nice syntax (incl. the regex syntax) that makes people want to sign it high praises over other languages? IMHO, any language that supports a rich regex syntax would be considerablly powerfull at string manipulation.

Gregory
The reason people want to sing it high praises is that Perl is very powerful in all the ways that matter, including in the very important metrics of "effort expended to develop software with functionality X".
DVK
+11  A: 

It is very subjective, so I wouldn't say that Perl is the best choice, but it is certainly a valid choice for string manipulation. Other alternatives are Tcl, Python, AWK, etc.

I like Perl's capabilities because it has excellent support (better than POSIX as pointed out in the comment) for fast regexs and the implicit variables makes it easy to do basic string crunching with very little code.

If you have a *nix background a lot of what you already know will apply to Perl as well, which makes it fairly easy to pick up for a lot of people.

Brian Rasmussen
Actually, it's pretty objective, when it comes to [regexes Perl has more features than the POSIX standard](http://en.wikipedia.org/wiki/Regular_expression#Perl-derivative_regular_expressions).
xxxxxxx
@spx2: fair enough. The subjective part was whether someone would consider Perl to be the go-to language in every sense. Some people are not too wild about Perl in general, so they may not prefer it despite its regex abilities.
Brian Rasmussen
down voter, please leave a comment.
Brian Rasmussen
actually , I completely disagree , Perl is the best language there is for string manipulation
xxxxxxx
spx2: Well, I agree but that doesn't make it so. A lot of people don't like Perl for whatever reason, so this is pretty subjective.
Brian Rasmussen
It turns out regexes *aren't* the only way of manipulating / dealing with strings either. Yes Perl has great regexes. Other languages have good regex support, and plenty of other fine features to boot.
Gregg Lind
+1  A: 

At the beginning, Perl was developed for easy report processing and dealing with text files, thus it's got a very strong REGEX support. Most of the info on REGEX you can find in perldoc.

jujav4ik
+4  A: 

Kids these days! Back in the day, all we had was SNOBOL -- and we liked it! Try it sometime...you never know, you might want something respectable to fall back on when this Perl fad runs its course!

Jim Lewis
+1 lol... (15 chars)
RCIX
+2  A: 

Perl is widely used for string manipulation tasks as its string manipulation API is easy to learn. And also its regex is widely used. It has been in use for a very long time and anyone with a Unix background would pick up perl very easily. Historically, perl was developed in the late 80's for report processing tasks and was "originally" developed for text processing tasks. So till date, the trend continues as anyone with a string manipulation task or text processing task would opt for perl as the first choice. Its not that other languages like python arent up to the task, but perl's popular in this area.

Zaki
+10  A: 

Perl -> Practical Extraction and Reporting Language

Perl's strength(when it comes to string processing) lies in it's very powerful Regular expression engine.

Because of this there are many people in the field of BioInformatics using Perl as their main tool, hence the large number of posts about BioPerl on PerlMonks . In BioInformatics they work with strings a lot , they call them "sequences"(I don't know much about this).

Perlmonks.org is the heart of the Perl community, check out the immense number of hits when you search for site:perlmonks.org regex 20,000 hits

You cannot ignore the sheer number of modules on CPAN:

This is very clear evidence that Perl is a very powerful language when it comes to string processing.

So if you want to do some string processing and you're using Perl, you've got it covered :)

xxxxxxx
perl's regex engine is not called PCRE. PCRE is a feature-limited "clone" of perl's regex engine.
jrockway
@jrockway , thanks , you're right.But from wikipedia:As of Perl 5.9.4, PCRE is also available as a replacement for Perl's default regular expression engine through the re::engine::PCRE module.
xxxxxxx
Don't trust everything you read in Wikipedia. :)
brian d foy
+5  A: 

Perl's reputation for line noise comes from two kinds of people:

  • Overly clever (for their own good) hackers (or sometimes just hacks)

  • People who wouldn't know good software development if it hit them over the head with a cluebat.

  • (NOTE: the sets are not mutually exclusive)

  • People who code/hack in perl (e.g. SysAdmins) who have very little training, experience or incentive to do software development. E.g. the percentage of people using Perl who do quick and dirty hacks with bad style and worse code quality is probably higher than, say Python.

In other (and less snide) words, you can write beautiful, incredibly readable and easy to maintain software in Perl. It all depends on who does the writing, what their priorities and skills are. Also, just like with any other language, you can write a miserable write-only mess with it.

The difference from other languages is that very often, the write-onlyness of said mess, when done in Perl, does indeed consist of very high density of non-letter characters (sygils and special characters in poorly written RegExes). This high density can indeed, asymptotically approximate line noise.

DVK
I'll preface this with: I like perl. However it's not only the programmer to blame for perl's reputation of linenoise. Perl, unlike most languages, allows lots things to be done implicitly (like the $_, $0, etc variables, implicit arguments to functions, etc.) where you don't have to explicitly declare you want something to be done, but perl will do something reasonable. This is a wonderful time saving measure, but is utterly incomprehensible to those who don't understand the semantics. Compare this to Python, which is a language almost anyone can read even without having written Python.
Falaina
Perl has the implicit arguments that you talk of. And as a programmer you can choose to use them or not.
xxxxxxx
@spx2 - Amen. @Falaina - that is EXACTLY what I was talking about. No software developer worth talking about would be caught dead using $_, myself inlcuded, outside of 5-line throwaway eprsonal use quick hack.
DVK
@Falaina - also, added 1 more reason (Few people hack python with no thought of/need to produce clean code than do Perl. This has very little to do with innate qualities of the language, and more to do with distribution of purposes it is used for.
DVK
@DVK this is turning into a "Perl vs. Python" as I see it ...
xxxxxxx
@spx2 - wasn't really meant that way. My intent is merely debunking the Perl myth. Python dragged into it merely as a need for something to compare to that was not perceived (as per first comment) as prone to line noise.
DVK
Default variables really bother people with narrow language experience that they cannot think beyond. It's the same group of people that feel that C#'s var keyword is inherently evil. There is exactly one way to think and program, and alternative viewpoints are bad.
brianary
+5  A: 

Because It is what is perl made for. Because Perl is expressive, powerful and fast. I have beaten many times specialized products with small and dirty script in perl written in few minutes. For example, outer join and large join vs. MySQL (just because can't do merge join), ETL processing vs. Java Hadoop (because I have years experience to write it effectively and perl IO layer is just great) and so and so.

Hynek -Pichi- Vychodil
+3  A: 

I disagree that Perl is the best language for text processing. Simple things are easy; to replace foo with bar:

$data =~ s/foo/bar/g;

Harder things are not simple, though. Look at Data::SExpression, for example. It is a lot of code to do something very simple.

An similar implementation in Haskell with PArrow looks something like:

import Text.ParserCombinators.PArrow

data Atom = QuotedString String | Symbol String
          deriving (Show, Eq)

data Sexp = Sexp [Sexp] | Atom Atom
          deriving (Eq)


quotedString :: Char -> Char -> MD a Atom
quotedString quoteChar escapeChar = between q q inside >>^ QuotedString
    where q = char quoteChar
          inside = many $ (char escapeChar >>> anyChar) <+> notChar quoteChar

doubleQuotedString, symbol :: MD a Atom
doubleQuotedString = quotedString '"' '\\'
symbol = word >>^ Symbol

atom, sexp :: MD a Sexp
atom = (doubleQuotedString <+> symbol) >>^ Atom
sexp = atom <+> (between (char '(') (char ')') sexp' >>^ Sexp)
       where sexp' = sepBy1 sexp spaces

Just sayin'. Perl is not the end-all-and-be-all of text manipulation. There are many reasons to prefer Perl to other languages, but parsing is not one of them.

jrockway
http://search.cpan.org/perldoc/Data::SExpression
Brad Gilbert
There is impossible to write Text.ParserCombinators.PArrow module in perl? It is new for me.
Hynek -Pichi- Vychodil
Not impossible. Also not done. The question is "why use Perl for parsers", not "why use Perl for writing a parser combinator library". The answer to the second question is not the same as the answer to the first.
jrockway
xxxxxxx
Trust me, all of those modules suck.
jrockway
do they ? can you elaborate ?
xxxxxxx
A: 

Perl was the go-to language for a long time. The problem is it can be pretty messy and difficult to maintain (some people can write Perl that avoids this, but it is very easy to wrote ugly code). I would not tell you to avoid Perl, but many have moved on to some modern alternatives.

I would recommend learning one of the newer scripting languages such as Python or Ruby. Both will work very well for your needs, and can easily handle more difficult tasks later on. They're both quite nice to work in, after having written C and Perl for so long.

In short, Perl would be a good hammer for this nail. Python and Ruby would be nail-guns.

MattG
Python is also not modern (almost same age as Perl), but Ruby is.
Alexandr Ciornii
C'mon. 1993 (or 1995) (Ruby) http://en.wikipedia.org/wiki/Ruby_%28programming_language%29 vs 1991 (Python). Neither is exactly a toddler.
Gregg Lind
+1  A: 

I like Perl a lot, write books about it, publish a magazine about it, and so on. I don't think I would ever say it's the best language to do anything in. A lot of that has to do with the task you need to do. For many string processing tasks, ETL, data cleanup, and so in, Perl is a very strong and capable language. You wouldn't have that much trouble doing simple tasks.

Your comment sounds like it comes from the early 1990s though, when the rest of the world hadn't caught up. Many of the dynamic languages are now up to task, so you might not have to switch languages. If you decide to use Perl and run into problems, there are plenty of people here who are willing to help, and not all of us will fault you if you choose something else. :)

brian d foy