tags:

views:

369

answers:

9

I've been using R for a little over a year now and it's been a successful venture. But all to often, I find that there is something that I can't figure out for lack of knowing how to find it or an example of it.

Stackoverflow,

Could you recommend a pathway for learning R in a manner that provides one with a toolset at their disposal to solve problems of a statistical nature?

There's a wealth of knowledge on the internet, between the r-project website and the mailings lists but it seems to be "everywhere" and nowhere when you're actually looking for it.

For example, when I first started using R, I went through "Intro to R". Then I read the language definition (which obviously hasn't sunk in). But every time I ask a question on Stackoverflow I'm presented with some new badass function that is the solution to all my problems in the short term. My question is, how did you know these functions existed in the first place? And how does one go about finding them? Presumably, you read something or found some resources that detoured your learning to the exponential part of the curve. What was it?

Obviously, R's functionality as a statistical tool is broad. For my own purposes I work mostly with economic or financial data. Hence, answers with this in mind would be most helpful.

+3  A: 

There's a free book you might be interested in: Introduction to Probability and Statistics Using R

interfect
And how recently published as well (July 28/2010). Thank you for this, I will take the time to review it.
Brandon Bertelsen
And yet, the best part: it's GPL-licensed, you can even download LyX/tex files with eps/pdf graphs. (I saw it on Gregor Gorjanc's blog, if you're subscribed to R-bloggers RSS, you'll probably get these stuff while they're hot)
aL3xa
+5  A: 

I'll start with this:

My question is, how did you know these functions existed in the first place?

Simple - we tried to solve a similar problem and came across that function. It either suited or didn't suit our needs but we now know it's there. I haven't used R much personally but what you're describing is the learning curve for every programming language ever. Firstly, you learn the "grammar" i.e. what you can do. Then you try to do something. You find you can't.

At that stage a programmer has a number of options. What do I do personally? Depends. I'll try and look up that package/header/library/whatever's member functions to see if something suits my needs. I might Google it, because unless you're really pushing the boundaries someone somewhere has probably tried and failed to do it before and had their question answered. If you are pushing the boundaries, someone somewhere has probably tried and failed before, but got no answer. I might try a forum or two to see what happens. I personally don't use IRC much, but that's another option, as are mailing lists depending on how specialised the problem is.

I also have a folder on my computer full of books which I search through depending on the problem and a small library of books I look through/learnt from, which often contain practical, not-quite-there-but-adaptable examples.

My only comment would be attempting to read the language specification is unlikely to be massively useful to you as a beginner. You won't fully understand what it means because you haven't pushed the bounds and tried things yet. For example, a novice in C might try this:

char c = '7';
int x = (int) c;

to convert the character '7' into an integer form. It's not a bad thought process until you understand how characters and ASCII work, then you see why the above doesn't give you what you want.

In short, I think this is going to be part of the learning process and I don't think you can cut it any shorter. The consolation is like any research, the more you do it the more you'll know where to look and what questions to ask on various communities.

Ninefingers
I agree with you. The time spent researching one's problem is the best way to learn. But I also believe that there are definitely best practices and resources out there that others have created that can help out substantially - when you know where to look :) And that's the purpose of this question.
Brandon Bertelsen
+13  A: 

Completely biased response: learn plyr, reshape and ggplot2. They will cover 90% of your data manipulation and visualisation needs. All three packages have a consistent philosophy of data (which the ggplot2 book touches upon), and are designed to be consistent and easier to learn.

Rather than learning many specialised functions, I really encourage you to learn about simple functions that can be flexibly composed to solve a wide range of problems. This is what plyr strives to do for data manipulation, and what ggplot2 strives to do for visualisation. It does mean you need to invest more time up front to learn a little about the underlying theory, but it's my belief that it will pay off handsomely in the long run.

hadley
So I'm an unabashed fan of all these packages and completely agree. Using this and packages for getting your data in (RMySQL type stuff or just read.csv()) you can do pretty much anything.
Dan
I absolutely agree, though you've advertised your products... =)
aL3xa
Modest, but true, just about everything I've done has involved one of those packages....
PaulHurleyuk
The most challenging problems that I've faced have all been solved by these three apps (all made by you). The most difficult thing to learn in R, for me, was data manipulation and plyr and reshape have made it, well, breezy.
Brandon Bertelsen
+1 for hadley's packages. I am really at the beginning of learning those three, but I can already tell it helps a lot and increases your flexibility. Plus I recommend to think a little outside R too. One of R biggest advantages is its flexibility to interact with other languages. Whatever language you already know may help you. For example, I already know a little bit about SQL, so RMySQL plus some data juggling in MySQL helped me to start a little quicker.
ran2
Agreed, I'm making substantial use of 'sqldf'
Brandon Bertelsen
+1 because those packages are great for analyzing dataframes, but don't forget that R does so much more. A non-dataframe use of R inclues running monte carlo simulations, which you really need to undestand the Base functionality in order to do it well.
stotastic
+2  A: 

Experience comes with time: there is not just one book you can read to know all the functions you will ever need.

Find problems to solve, and have a look at the CRAN Task Views. This is the way I would do it.

wok
+1  A: 

Here is a good list of resources for learning R:

http://stats.stackexchange.com/questions/138/resources-for-learning-r

Also, that website in general is a good resource.

In general I would say that following a mailing list, or a help list is the best way I have found for learning new things. (That and the "R magazine": http://www.r-bloggers.com )

Tal Galili
+4  A: 

My way how I learned R.

R-Resources:

*) To learn R, the most important resource is google. search for: “TOPIC r-project”, “TOPIC filetype:r”, or “TOPIC site:nabble.com”.

*) Second, look at the example code provided with most packages. go to “http://bm2.genes.nig.ac.jp/”, search for a topic and look at the example code. run it and adapt it, this way you can often solve part of your problem.

*) Third: the r-help mailing list. Read the posts, the basic questions get asked over and over again. If you have a problem and you are completely stuck, ask a question on the mailing list.

*) Finally, look at the source code of the R-packages. that’s the hardest part. if you can alter the code to your needs, you have mastered R ;-)

Some Tips:

*) R has a steep learing curve. that’s a feature ;-) , it is designed to solve advanced problems and in the end you are fast than when using an alternative to R.

*) know every single R package and function that is relevant to your problem. the strength of R is that there are so many packages availiable (around 2000, I think). Usually there is always a package that’s more suited or that already solves your problem. (some help pages are badly written and hard to understand - I got used to it)

*) R books are not helpful in learning R. yes, that’s true. If you are an expert programmer and expert statistician, you don’t need any book on R. (only exception is Hadley Wickham’s ggplot2 book). If your are not, learn programming in general and/or advanced statistics.

*) some R package have known bugs, which nobody will fix (package owner left university, etc.). just a warning, this can be tricky if you are looking for a bug in your code and the bug is in a R package.

mrsteve
+1  A: 

One of the things I do is follow the RSS feed of R questions on SO (http://stackoverflow.com/feeds/tag/r). Then I can browse what other people have asked/answered.

Often I will favourite a particular question/answer if I think I'll use it, or jot down the salient points into my notebook software (OneNote), occaisonaly I'll even try the question/answer out myself.

EDIT:

I'd also recomend Patrick Burn's book R-Inferno. It's not so much of a training book as a description of all the gotchas and oooh moments Patrick has found (so far).

PaulHurleyuk
+1  A: 

Some interesting links:

Intro, links and examples: http://manuals.bioinformatics.ucr.edu/home/programming-in-r

A lot of documentation: http://en.wikibooks.org/wiki/R_Programming

R forum: http://r.789695.n4.nabble.com/

waanders
+1  A: 

Learning the RODBC package to interact directly with Oracle data made a big impact at my job. My boss was amazed when I pulled Oracle data directly into R and cranking out a plot in only a few lines of code. Try doing that in Excel!

Moral of the story, learn how to pull in data and manipulate it within R. Then move to some of the cooler stuff like ggplot.

stotastic