views:

552

answers:

9

I work as support staff in a Biology research institute as a student, and Perl seems to be used everywhere. Not for every single project, but it seems that more than half the people here have a few Perl books in/on their office/desk.

Why is Perl used so much in Biology?

+7  A: 

Perl seems to be the language of choice for Bioinformatics - there's even an O'Reilly title on just this subject: Beginning Perl for Bioinformatics.

Paul R
Exactly! But why?! :) Maybe I'll see if I can find a copy of that book since it might have introductory chapter explaining the answer to my question.
Kevin
+10  A: 

Probably because Perl is good at manipulating strings, and much research in genetics involves the manipulation of veeery long "ACTGCATG..." strings. Just guessing...

Federico Ramponi
What makes Perl very good at manipulating strings?
Kevin
They've got a really good regular expression engine, and always have had. (Larry Wall is one of the RE engine gods, a shub-niggurath of string manipulation.)
Donal Fellows
@Kevin: That was Larry Wall (Perl's creator)'s original intent -- to be a Pathologically Eclectic Rubbish Lister. :)
Ether
@Donal: heh, we both mentioned Larry at the same time :)
Ether
@Ether: You should look up "shub-niggurath" to see what sort of god I had in mind. Make sure the room is well-lit first. ;-)
Donal Fellows
+3  A: 

Perl basically forces very short development cycles. That's the kind of development that gets stuff done.

It's enough to outweigh perl's disadvantages.

Andomar
How does Perl force short dev cycles?
Kevin
I think he means, "allows for". How does it allow for short development cycles? Libraries and minimal boilerplate. The code you write solves your problem; it doesn't reinvent the wheel or exist to placate the compiler (hello, java).
jrockway
Well, "allows" generally means that you make an edit and immediately run the result. There's no a priori need to compile, link, etc.
brian d foy
+24  A: 
mobrule
Ooh, that looks good. Thanks!
Kevin
@mobrule : Regarding point #6, is your analysis from 2010 or did you get that from an old book ???
Philippe
The analysis is excerpted from the linked article, which was from the summer of 1996.
mobrule
Sorry I didn't read the source... I was wondering who was still doing cgi scripts for big websites today ;-)
Philippe
@Philippe: This is pretty old. Bioperl came out ages ago and is widely used. I use python myself for many of the same reasons.
Chinmay Kanchi
+7  A: 

I use lots of Perl for dealing with qualitative and quantitative data in social science research. In terms of getting things done (largely with text) quickly, finding libraries on CPAN (nice central location), and generally just getting things done quickly, it can't be surpassed.

Perl is also excellent glue, so if you have some instrumental records, and you need to glue them to data analysis routines, then Perl is your language.

singingfish
+6  A: 

The real answer probably has less to do with Perl than you think. Many of the things that happen are accidents of history. At the time, way back when, Perl was pretty popular, Java was getting more popular, not too many people were paying attention to Python, and Ruby was just getting started.

The people who needed to get work done used Perl and made some libraries in Perl, and other people started using those libraries. Once people start using something that is moderately useful to them, they tend not to switch (economists call those "switching costs"). From there, even more people start using it because a lot of other people are using it.

The same evolution might not happen today. I'd say that Perl, Python, and Ruby are all completely adequate and up to the task. All the things that mobrule quotes from Lincoln Stein could apply to any of the three today. If everyone had to start from scratch today, any one of those languages could be the one that everyone uses.

I've noticed, from my own client base though (a very small and unrepresentative sample of biotech), that the people pushing the programming for a lot of the biological stuff seemed to be at least part-time sysadmins who were supporting scientists. The scientists worried about the science and did some light programming, but the IT support people were doing a lot of the heavy lifting for the non-science parts. Perl is very well positioned as a sysadmin tool since it's the duct-tape of the internet.

brian d foy
I tend to disagree here. Perl really is rather expressive, so if your primary concern is not programming, but getting the job done, and getting back to your real job, then the expressiveness of the language helps the computer to think like you, while a more typical language more helps you to think like the computer.
singingfish
While Ruby and Python are very similar to Perl their regular expression engines are not as good. They aren't as fast and can't do as many "crazy" things. This normally isn't a problem because if you're doing something really crazy a grammar is probably a better fit anyway, but then you have to teach all those biologists grammars, recursive descent parsing, etc.
mpeters
Although Perl might have more power, I find that most people barely use everything in Learning Perl.
brian d foy
+3  A: 

Perl is very powerful when it comes to deal with text and it's present in almost every linux/unix distribution. In bioinformatics, not only sequence data are very easy to manipulate with perl, but also most of the bioinformatics algorithms will output some kind of text results.
Then, the biggest bioinformatics centers like the EBI had that great guy, Ewan Birney, who was leading the BioPerl project. That library has lots of parsers for every kind of popular bioinformatics algorithms' results, and for manipulating the different sequence formats used in major sequence databases.
Nowadays, however, Perl is not the only language used by bioinformaticians : along with sequence data, labs produce more and more different kinds of data types and other languages are more often used in those areas. The R statistics programming language for example, is widely used for statistical analysis of microarray and qPCR data (among others). Again, why are we using it so much ? Because it has great libraries for that kind of data (see bioconductor project).

Now when it comes to web development, CGI is not really state of the art today, but people who know perl may stick to it. In my company though it is no longer used... I hope this helps.

Philippe
A: 

Bioinformatics deals primarily in text parsing and perl is the best programming language for the job as it is made for string parsing. As the O'Reilly book (Beginning Perl for Bioinformatics) says that "With [perl]s highly developed capacity to detect patterns in data, Perl has become one of the most popular languages for biological data analysis."

Kyra
A: 

This seems to be a pretty comprehensive response. Perhaps one thing missing, however, is that most biologists (until recently, perhaps) don't have much programming experience at all. The learning curve for Perl is much lower than for compiled languages (like C or Java), and yet Perl still provides a ton of features when it comes to text processing. So what it takes longer to run? Biologists can definitely handle that. Lab experiments routinely take 1 hour or more finish, so waiting a few extra minutes for that data processing to finish isn't going to kill them!

Just note that I am talking here about biologists that program out of necessity. I understand that there are some very skilled programmers and computer scientists out there that use Perl as well, and these comments may not apply to them.

Daniel Standage