tags:

views:

261

answers:

4

I will be teaching an econometrics course to masters students in the fall. I think it is important for them to learn programming with data as an essential applied research skill. What suggestions do you have for the programming language. I am leaning mostly towards R. What others should I consider?

A: 

Since you're interested in R, you could also take a look at Incanter. Since it's built with Clojure - a Lisp dialect for the JVM - you'd be able to leverage the vast array of existing Java libraries.

rcampbell
Ah, so in order to run a multiple regression you think they need to learn Java and Lisp as well? You haven't ever taught, have you? ;-)
Dirk Eddelbuettel
@Dirk: correct, though I'd say Java and Lisp wouldn't have to be the main focus since Incanter can run as a standalone environment. Learning R means learning a syntax and environment as well. The "safety net" of helpful resources around Java and Lisp is more extensive than anything R has. Lisp syntax is also quite tiny, so it might lower the barrier to entry a bit.
rcampbell
@Dirk: I don't disagree with your point, but that said: you don't really need to know Java to program with Incanter; just Lisp. I still think that Clojure is harder to work with than R (not to mention that R is better for econometrics at this stage), but it's probably not that big a gap so far as the learning curve is concerned.
Shane
@rcampbell: Incanter looks interest. I'll check it out for my own work. I think R is the most straightforward way to go for the classroom. Thanks.
TJB
+11  A: 

R is a very good choice. Go for it.

The number of resources on the web keeps increasing. One nice set of slides is provided the UCLA Stat Consulting Center.

And as you are into Econometrics, make sure you look at Grant Farnworth's Econometrics with R on CRAN; the Applied Econometrics with R book by Zeileis and Kleiber is also very good.

Dirk Eddelbuettel
@Dirk: I actually know Grant. That is a good place for the students to start. I have even considered an assignment for the students to choose an area to expand it (Grant willing). I will check out Zeileis and Kleiber. Thank you.
TJB
+1 For Farnsworth, and especially for the Zeileis and Kleiber book. I highly recommend it as an introduction.
Shane
+2  A: 

I prefer R but other free options to consider would be:
a combination of octave with gnuplot (Octave is a free Matlab implementation)
python with numpy,scipy and matplotlib

frankc
@user: I love python. I wish python had the momentum behind it for statistics/econometrics that R has. But there seems to be an equilibrium around R for the present. What are your thoughts about python vs. R?
TJB
Well, I am not really a python guy. I know perl well and thus haven't felt the need to dive into python because of it. One underrated feature of R that makes it nice for stats/math type applications is that all types are really vectors of those types and all operations act on vectors by default.
frankc
+2  A: 

I'm surprised no-one else has mentioned Excel. As Brian Ripley once said (see slide 7):

Let’s not kid ourselves: the most widely used piece of software for statistics is Excel.

Indeed, Excel is an excellent tool for adding up columns of numbers. Having said that, if the analysis you are doing is any more complicated than that, you should definitely use a proper programming language.

Of the three obvious data manipulation languages (R, MATLAB and Python), R has the best data manipulation tools. See this other SO question for a more detailed comparison.


EDIT: Upon rereading this, I sound a rather pro-Excel. I'd like to expand my answer to save my reputation.

Excel causes me many more problems than benefits. Its widespread use in my organisation is mostly detrimental. It makes it very hard to trace where data has come from, and how your computations work. Debugging Excel models is near impossible. It encourages local data stores instead of central databases. It doesn't work with diff tools and it makes reproducibility of your science hard. From a semantic point of view, it doesn't separate data and what-is-done-to-the-data. The idea that all your variables need a location distracts from understanding. The plotting capabilities are laughably awful.

All that said, Excel is useful for a few specific things:

  1. As a CSV viewer. Sure, R has the View function, but it's not as pretty.

  2. Really simple exploration of data. Sorting it, filtering it, adding up columns. I find that these can be done slightly quicker with a point and click interface than with code. Of course, you'll have to write code later for reproducibility, but in the initial stages, Excel is quite nice for this.

  3. The graphs are distinctive and easy to spot. If you see someone give a presentation with a graph drawn in Excel, you know not to trust the results.

That's it. For anything else, it's a mess.

Richie Cotton
You must be kidding. Have you read [Spreadsheet Addiction](http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html) by Pat Burns ?
Dirk Eddelbuettel
@Richie: I don't consider Excel to be a programming language, even with VBA. In my mind it obscures rather than enlightens data analysis. Thanks for the link to the other discussion.
TJB
Wow, really surprised I didn't get down voted into oblivion with this. And yes I was sort of kidding. Excel is horrible in most respects. It is worthwhile knowing how to use though, for occasions when other people give you data in an XLS file.
Richie Cotton