views:

715

answers:

11

I'm trying to evaluate the purchase of a statistical tool. This will be used in part by non-programming users (doing clinical studies) and in part by programmers, so I'm trying to find a good compromise between usability and automation. Of course, cost is an issue, but if I can build a solid case, we could probably buy a commercial package, so we're not totally limited to free options.

So far, our options are:

  • Statistica (which some non-programmers already know)
  • Matlab Statistics toolbox (programmers already use matlab)
  • R language (would need a UI for non-programmers)
  • Hack something into Excel (not fun, but that's what non-programmers do right now)
  • ?...

What else is out there? What's the industry standard? What kind of distinctive features should I look for? What would you recommend, and why?

Ideally, we'd like a tool that can run both on Linux and Windows machines.

(I work in medical imaging, so we do both biostatistics, and software engineering statistics)

+1  A: 

I would look at S-Plus.

You get a strong programming environment (S-Plus Workbench, based upon the Eclipse platform), an intuitive GUI for non-programmers, and an extensive user community (including users of R, which was based upon the original S).

Ben Hoffstein
+3  A: 

I recommend R, personally. It's used by bioinformaticians and psychologists, I hear. Don't know what your field is though, so maybe it's a lousy choice. It is reasonably easy to use and learn.

Paul Nathan
R is a very powerful language for any type of statistical modeling.
cciotti
R is command-line driven; it does not have a GUI.
Ben Hoffstein
+3  A: 

Stata and SPSS tend to be the most commonly used packages in clinical studies. Both are pretty easy to pick up and use for non-technically minded folks but are generally flexible enough. I've used Stata more than any of the others and have been pretty happy with its options (supports both menu-based and command line operation, decent enough plugin system to get new user-created modules, good graphing support).

R is a little more daunting for newbie users, though it is popular with the biostatisticians. Since it's free, that's another nice point in its favor.

Randy
rcar, what country are in you? In US Pharma, SAS is much more common than either Stata or SPSS.
Gregg Lind
US, at an academic health center. Maybe it's just something about the culture here, but those two packages are used by pretty much everyone who does studies here if they're not using R.
Randy
I wonder if that implies that your SAS people switched over to R at some point. Interesting data point, thanks!
Gregg Lind
@Gregg Lind - SPSS is big in Psych research, particularly survey based.
Rob Allen
A: 

Consider Excel one more time. It is well known, and widely available. Refer this book or this book.

We've tried Excel, and frankly, it doesn't give us what we need efficiently. Built-in functions don't go much beyond the One-Way Anova, and it's often very clumsy to use. Yes, I could reimplement a statistical framework in Excel, but it's not the best use of my time.
Kena
More dangeriously, Excel is known to have bugs in the Analysis Toolpak that make it unsuitable for regression.
Gregg Lind
+1  A: 

Visual Numerics is another option.

JasonS
+3  A: 

Hands down it's R. R is very programmer friendly. It has functional aspects and it's GNU.

S-PLUS and R are both based off the S language. Both are similar and in most cases you can run as S-PLUS program in R and vice versa.

SAS is another option, although geared more towards BI and enterprise. SAS has a simpler syntax than R and in my opinion is easier to pickup for a non-programmer.

Other options include SPSS, Matlab, and even Excel.

Ryan Guest
+1  A: 

It sounds like you're trying to maximize multiple goals. You say "This will be used in part by non-programming users (doing clinical studies) and in part by programmers, so I'm trying to find a good compromise between usability and automation", with an implicit assumption that this will be the same tool in both cases, when that might not be realistic. What's the compromise for Word and LaTeX, for example?

Some different questions about the requirements:

  • Should it be extensible for programmers
    • Able to use C extensions
    • Easy to make new procedures and methods
  • What analysis are non-programmers going to want to use?
  • Graphics?
  • Ease of use for different groups

So my read on this:

Easy to extend: R/S-plus, Matlab/Octave (I happen to prefer R, but I do more stats and fewer matrix things) Easy to use for normal people: Excel, custom wrapped R, SPSS

Also, R on windows has a limited GUI, which may or may not help your users.

If it was me, I'd go with a hybrid solution. Use R, and give a cheat sheet for for common tasks to non-programmers that illustrates common tasks, or even better, write some wrapper functions with names like "image_summary" that automate their exploratory work.

For writing front end scripts for R, the RPy python wrappers might help as well.

Gregg Lind
A: 

This Wikipedia page compares the features available for several statistical packages, as well as their OS compatibility and pricing info (which seems a little out of date, but it gives an overall idea)

Kena
+2  A: 

For a statistical package with a GUI which non-technical users can use, I would recommend that you go with "SAS Enterprise Guide". You will get the common and advanced SAS procedures, an excellent graphics facility and the ability to program for the technical users. I recommend that you start with the "SAS Learning Edition" (http://support.sas.com/learn/le/) which is a fully functional version of Enterprise Guide, but limited to processing 1000 rows at a time only. It is under $500, which makes it a pretty good deal.

Anindya
+1  A: 

SAS Enterprise Guide has good usability for non-programmers. Also, it has good options to connect to Excel. And for programmers, it's the most robust option out there. The sas server runs on anything, though, enterprise guide is Windows only.

mjw149
A: 

We ended up getting the Matlab Statistics toolbox (mainly because we already have some experience with Matlab in the team, and needed the tool anyway)

So far, it's doing what we need to do, and it's easily expansible. Usage will show if non-programmers really use it, but so far it's looking good.

Kena
Thanks for letting us know Kena.
Tal Galili