views:

1580

answers:

16

I'm currently considering R, matlab, or python, but I'm open to other options. Could you help me pick the best language for my needs? Here are the criteria I have in mind (not in order):

  1. Simple to learn. I don't really have a lot of free time, so I'm looking for something that isn't extremely complicated and/or difficult to pick up. I know some C, FWIW.

  2. Good for statistics/psychometrics. I do a ton of statistics and psychometrics analysis. A lot of it is basic stuff that I can do with SPSS, but I'd like to play around with the more advanced stuff too (bootstrapping, genetic programming, data mining, neural nets, modeling, etc). I'm looking for a language/environment that can help me run my simpler analyses faster and give me more options than a canned stat package like SPSS. If it can even make tables for me, then it'll be perfect.

  3. I also do a fair bit of experimental psychology. I use a canned experiment "programming" software (SuperLab) to make most of my experiments, but I want to be able to program executable programs that I can run on any computer and that can compile the data from the experiments in a spreadsheet. I know python has psychopy and pyepl and matlab has psychtoolbox, but I don't know which one is best. If R had something like this, I'd probably be sold on R already.

  4. I'm looking for something regularly used in academe and industry. Everybody else here (including myself, so far) uses canned stat and experiment programming software. One of the reasons I'm trying to learn a programming language is so that I can keep up when I move to another lab.

Looking forward to your comments and suggestions.


Thank you all for your kind and informative replies. I appreciate it. It's still a tough choice because of so many strong arguments for each language.

  1. Python - Thinking about it, I've forgotten so much about C already (I don't even remember what to do with an array) that it might be better for me to start from scratch with a simple program that does what it's supposed to do. It looks like it can do most of the things I'll need it to do, though not as cleanly as R and MATLAB.

  2. R - I'm really liking what I'm reading about R. The packages are perfect for my statistical work now. Given the purpose of R, I don't think it's suited to building psychological experiments though. To clarify, what I mean is making a program that presents visual and auditory stimuli to my specifications (hundreds of them in a preset and/or randomized sequence) and records the response data gathered from participants.

  3. MATLAB - It's awesome that cognitive and neuro folk are recommending MATLAB, because I'm preparing for the big leap from social and personality psychology to cognitive neuro. The problem is the Uni where I work doesn't have MATLAB licenses (and 3750 GBP for a compiler license is not an option for me haha). Octave looks like a good alternative. PsychToolbox is compatible with Octave, thankfully.

  4. SQL - Thanks for the tip. I'll explore that option, too.

Python will be the least backbreaking and most useful in the short term. R is well suited to my current work. MATLAB is well suited to my prospective work. It's a tough call, but I think I am now equipped to make a more well-informed decision about where to go next. Thanks again!

+5  A: 

Python is the way to go. It's pseudocode that works (and on Steroids).

No compiling, no nothing. You just run it like you run it. Also, Python has nice libraries for processing data/stats. Matplotlib and scipy or numpy.

Python is also simple to learn. Easy to read.

Edit: if you want executables/GUI, then Python might not be the best option.

TIMEX
I don't think Python is a good option at all. It's nice in general, but not for this case. It has lots of features and concepts that probable (s)he won't use, but will make learning the language harder.
anthares
Some pretty sexy GUIs can be made in Python via PyQt4 without much effort.
awesomo
Re gui's: there's also pydev for eclipse.
Shane
Have you seen two other options?
mbq
+15  A: 

Given your priorities, I would highly recommend Python. Since you will want access to some advanced statistical packages, I would suggest calling R libraries from Python when needed via the RPy2. Matplotlib/Numpy/Scipy are all excellent packages for Python, but they are no where as complete in terms of advanced statistics libraries as R. Using Python with RPy2 gives you the best of both worlds.

Edit: A compiled list of useful R packages for psychometrics.

Edit: Example of calling R functions from Python via RPy2:

import numpy
import rpy2.robjects as robjects
import rpy2.robjects.numpy2ri # auto-conversion of numpy data-types to R objects

def prcomp(ndarray):
    """Compute the principal components of a matrix via the R function prcomp."""
    rmat = robjects.RMatrix(ndarray)# convert to R object
    robject_prcomp = robjects.r.prcomp(rmat, retx=True, center=True, scale=False)
    return numpy.array(robject_prcomp.r['x'][0])

Edit: There seems to be some confusion as to why I suggested using R from Python. Python already has a large number of typical statistics implemented. The principal components analysis example above was contrived in order to show how it works in general. In Python you can use the Modular toolkit for Data Processing for PCA:

>>> import mdp
>>> y = mdp.pca(x)

I was just trying to make the point that even if Python is selected because it is easy to learn and is a good general purpose utility language, the RPy2 bridge keeps it from being limited statistically.

awesomo
compared to any Math oriented language like mathlab, this is horrible.
Pop Catalin
Meh, you could hide the boilerplate object conversions with decorators easily. In terms of one-liners, it's hard to beat matlab, but IMO Python scales much better to more complex problems. RPy2 gives you access to an enormous number of battle-tested statistics that would be time-consuming (and error prone) to re-implement.
awesomo
I agree with awesomo. I've had problems getting MATLAB to scale to large programs.
Chinmay Kanchi
I find python to be more natural and easier to learn than R, especially for any kind of procedural programming. Statisticians use R, and R is skilled at vectorial and matrix operations and has data analysis and plotting routines built in. Because programming in R has a number of quirks a short program to do 'any old thing' would be more easily expressed in python than R. Your students would be able to read most simple python programs before they would be able to read their equivalent R programs. R is not a good language for data collection, web development, human subject experiments.
Paul
Your python program for principal components would be a 1-2 liner in R because R has two library functions for doing principal components. However, comparing the two for, say, reformatting or reorganizing data, python would be easier and more flexible. R has a steeper learning curve.
Paul
@Chinmay Kanchi I don't think the OP intends to use it for "large programs", but experimental programs.
Pop Catalin
'experimental program' probably means creating and supporting human subject research, not writing a short test program. The length of such a program depends on what it does.
Paul
You seem to be suggesting using R for the actual statistics, so this person would need to learn R as well. Why not just stick with that?
Shane
Wow, crazy upvoting/downvoting... can you say language war?@Paul I agree with you 100%. @Shane I am suggesting using R via RPy2 when you need a particular statistic that doesn't happen to be implemented in Python yet (and you'd rather not implement it yourself). Using RPy2 does not require you to learn R, it just requires you to be able to search CRAN for the R libraries you want to use.
awesomo
@Shane When they want to create a simple human subject experiment, either standalone or on the internet, then what? Still use R?
Paul
I've used both Matlab and Python, and I can say for sure - Matlab. There are so many details that Matlab handles for you. You can do everything you want in either - it'll just be easier in Matlab.
Marc
+10  A: 

Matlab. It is by far the most popular in the field of psychological/cognitive research. All the options that you mentioned and several others will allow you do modeling, statistical analysis etc. But if Matlab is a knock-out winner on your 4th requirement.

I'm in the field of cognitive research myself. I use mostly python, but when I need someone else from the field to understand what I'm doing I use Matlab. Actually, I use Octave, which is an open source alternative to Matlab, but since its 99% syntax compatible, its OK.

Just an example showing Matlab is at least x10 more popular in publications in the APA, Cognition, and Brain and Cognition journals:
google scholar search for Matlab - 250 results
google scholar search for python - 25 results (and note that some, if not most of the results actually refer to Monty Python, or Python snakes...)
(Both searches are since the year 2000)

Ofri Raviv
I totally agree. I used to do computational neuroscience/psychophysics, and no tool served me, and my 60+ colleagues, better than matlab.
Fredriku73
+23  A: 
doug
R has decent coverage in academics. At least at my school, my statistics class gave us a choice between R, Splus and something else, I forget. Probably matlab.
Ricket
The questionner wanted to use the language to make canned programs for presenting psychology experiments. That is something R cannot do.
John
A: 

MUST BE MATLAB if you are in Psych research, and any form of brain imaging.

You can then use SPM

for spatial normalization of brain images, etc etc.

People saying it does not scale to large problems - it is the defacto standard in science and also finance. I work in brain imaging, my mate works for a hedge fund we both use it - python is for noobs.

NimChimpsky
-1 for the disparaging remark towards people that prefer a different language than the one you use.
Sharpie
-1 for "python is for noobs".
egarcia
tongue in cheek
NimChimpsky
-1 for not understanding the question
fortran
+1  A: 

I'll actually suggest Python, but with a twist:

Learn python so you can use it in conjunction with a Spreadsheet!

  • Resolver One is a propietary application that allows python for the scripting parts
  • PySpread is a free, less developed, alternative.

You will get the simplicity of spreadsheets for the initial, data manipulation, and python for more demanding calculations. This will allow you to learn the language at the same time as you get useful results in your experiments. With time, you will have learned enough to roll your own apps.

I've personally tried Resolver One and found it quite useful; After doing statistical stuff with VBA + Excel (business requirement, not my election) the change was thrilling.

I haven't tried PySpread so I can't comment on it.

egarcia
+9  A: 

I do research in psychology and switched from SPSS to R a few years back. I've never regretted the switch, although my knowledge of the merits of Python or Matlab is not as great.

Ease of learning: If you have a good understanding of statistics and programming then R is not that difficult. There's a lot of resources on the Internet. I posted a list of R Resources for psychology researchers that might be helpful.

Good for statistics/psychometrics: R has been developed by the statistical community. Thus, it's great for statistics and psychometrics. Doug mentions a few good examples. Here are some further links with regards to specific things you mentioned: bootstrapping, data mining, modeling (check out OpenMX, sem, and lme4). R is great for automating standard and custom analyses. The language for programming in R is the same as for standard data analysis. Thus, R encourages a gradual development in sophistication of analyses (see John Chambers book to get a sense of this design philosophy).

Executable programs: R has excellent data manipulation tools. Thus, you could write a script to transform experimental data and output it in a format that suits. Phil Spector's book is quite good in providing guidance on data manipulation in R.

Regularly used in academe and industry: From personal experience I know many researchers in psychology who use Matlab and many researchers who use R and some who use both.

Jeromy Anglim
+13  A: 
Richie Cotton
Python does not create executables.
mbq
+4  A: 

+1 for R.

And since no one else mentioned it, you can also have a look at "Using R for psychological research": http://personality-project.org/r/

Tal Galili
+1  A: 

My wife is a Psychologist and I have been teaching her SQL. She used SPSS in College and allot of Excel in her work with Data Analysis. But using SQL Server Developer Edition and T-SQL she has been able to get more from her data and then use reporting services to produce great presentation charts and reports for talks and publication.

It is easy to learn and can do allot. You can download a Free version of SQL Server Express or purcahse the Developer Edition for about $40 which gives you the complete Enterprise version with a license for a single machine.

Todd Moses
+4  A: 

I vote for R! You can learn Python much faster, but... don't use general purpose programming language if you can use specialized one instead. R has a steeper learning curve, but in this very moment there are more than 2000 packages, and I use mostly psy and psych for psychological stuff, and plyr, reshape, and Hmisc for other non-psych stuff.

I recommend Rob Kabacoff's site and be sure to take a glance at Jeroen Ooms' web application for IRT analysis.

Check out Peter Daalgard's book: Introductory Statistics with R.

Use R!

Edit: Visit r-bloggers.com to stay up-to-date with latest news in R. REvoluion Computing organized webinar ("Introduction to R") for all of those willing to learn R , and the next one will be held on 23rd of February ("7 Ways to Increase your R Productivity"). Note that REvolution Computing promotes it's own product: REvolution R Enterprise 3.1 - a full-blown R GUI with sytnax completion and other helpful stuff. For now, REvolution R Enterprise is (unfortunately) available only for Win and MacOS, but there's a package revolution-r (or vice versa) on Ubuntu. It's not an IDE for R, but just some language optimization patch. I reckon that REvolution R Enterprise will be available for (GNU/)Linux soon. Well... I hope so...

aL3xa
+4  A: 

R wins with its gigantic library of statistical routines. The S language per se isn't that overwhelming though and data handling can be cumbersome when compared to SQL queries that achieve the same goal -- luckily there is the sqldf package that allows users to run SQL queries over R data frames. Under Windows, you can also integrate R with Excel/OpenOffice, which could make certain tasks slightly easier.

Python wins as a multi-purpose programming language that can also be used to solve other tasks than conducting statistical analyses. On the long run, I'd expect python or something like incanter (clojure/java-based) to gain popularity at the cost of R.

lith
A: 

I am in the same kind of situation myself, being a social psychologist, having recently (and finally) entered the exciting world of Linux/Ubuntu, and being a huge FOSS fan. This won't be much help after all the great answers, but I couldn't help say: I've decided to learn both R and Python. R, mostly to free myself of SPSS/PASW and to get deeper insight into/have more flexibility regarding statistical procedures. Python, mostly to be able to program experiments (hopefully, with the help of PyEPL and PsychoPy, we'll see...).

I've done quite a lot of browsing the web about these, and without exception, everybody has written great things about both R and Python. I don't know how I will manage to learn both, as I'm about to start my first Asst. Prof. position, but I'm very excited and confident that they will pay off.

I've had minor experience with MATLAB as a grad student, and have seen it used extensively by the neuroscience-oriented psychologists in my program. But it's proprietary and I'm looking to go completely FOSS. Good luck to you.

Adil
+1  A: 

Since there are lots of good responses already I'm not going to cover the exact same ground, but I think you should evaluate Octave carefully before you decide to use it as your go-to solution. Its syntax is highly compatible with MATLAB, yes, but sometimes it can be slower than the seven year itch by comparison with its proprietary counterpart. If you're faced with having to write intensely loopy code, you might want to use Python or R instead. (I once wrote a program in Octave, found out it was too slow, transliterated it into SAGE -- a Python variant with the same syntax but different semantics -- and found that it ran about 10 times faster. I later moved the Octave program to a MATLAB server at work and observed a similar speedup.)

estanford
A: 

If money does not matter I would suggest MATLAB. IMO it is powerful and comfortable to work with and performant.

However, considering the fact that MATLAB (+ compiler + a few toolboxes) can easily be 10k€ AND causes additional yearly fees, where R comes for free the cost-benefit ratio clearly favors R.

ymihere
A: 

As someone who's used all of them in Psychology I hope I can give you a breakdown of the costs and benefits.

Matlab - This is the most used of the three languages in your field (in general). It is the only one that is good at both basic analysis and that you can use to program experiments beyond the capability of Superlab. It even does some very advanced programs specifically for your field (SPM, Psychophysics Toolbox) that are frankly unmatched in other languages. It's relatively expensive but there is the Octave alternative. Unfortunately, if your data analysis involved FMRI or EEG the most popular Matlab packages don't work with Octave yet (but I did get it working with Psychophysics toolbox).

Python - The best all purpose programming language of the bunch. Using tools like VisionEgg or even just Pygame one can program up experiments and make executables. It's free. It's probably the easiest to learn. It doesn't have much help for you as far as statistical analysis. If you know how to write your analyses yourself then it's fine. If your use it correctly it can easily be the fastest.

R - This is by far and away the best of the three for statistics. Unlike the others you don't have to know how to write much of your statistics software from scratch. You simply plug in packages and use them. It's the only one of the three that's any use to someone who's a non-programmer and wants to do stats. It's good for data manipulation and is much better than the typical commercial options in your field. However, you cannot use it to program up experiments. It's just for the analysis. (well, design as well, but in the abstract sense, not as in replacing Superlab).

I believe that your options come down to Matlab OR Python+R. You're going to have a hard time replacing R in Python and you can't replace Python in R. So, if you pick Matlab you only have one language to learn. However, the statistical capabilities of Matlab that are easily accessible don't even come close to those of R.

Also, if you only know Matlab you end up being someone who only has a hammer... everything starts to look like a nail.

I strongly recommend Python+R. Python is as good as Matlab for generating experiments and can be used to make double click executables for no cost. R is the best thing you could use for stats. After knowing all three.. and many others, I'm most happy with my Python+R setup.

John