views:

181

answers:

5

Or a list of how to do in R things you do in SQL (or vise versa) ?

Thanks,

Tal

+2  A: 

You could look at Joshua Reich's presentation on R and SQL (see page 11).

Shane
Thanks Shane - exactly what I was looking for. Should I guess this is the biggest table I will find?
Tal Galili
I'm not sure...that's the only one that I'm aware of.
Shane
+3  A: 

sqldf package could be of some help here perhaps?

There is also a talk from Joshua accompanying presentation that Shane mentioned above.

radek
Thanks Radek - that package actually gave me the idea for the question. Interesting how no one made such a thing. Maybe this should be some sort of R community project (that is - to take a bunch of SQL tasks and that all of us will compile all the ways to do them in R - something like this maybe: http://rosettacode.org/wiki/Category:Database_operations).
Tal Galili
For me that would be a bliss, since I feel much more comfortable managing data with SQL [so far]. Thanks for rosetta link - interesting.
radek
+1  A: 

It's also worth looking into the RMysQL package.

I work with very large datasets that cannot be dumped into text prior to importing in R. This package allows me to use standard mysql queries from within R to pull in subsets of my data.

Maiasaura
Thank you. I played with it about two years ago. I remember that the connection time was very long. Is this still an issue today ?
Tal Galili
I find RMySQL operations to be very slow, compared with native queries or wrappers for other languages.
neilfws
I find it reasonably fast. Although I must emphasize that it is not something you should do repetitively. It's good to have a workflow (http://stackoverflow.com/questions/1429907/workflow-for-statistical-analysis-and-report-writing/1434424#1434424) so queries only happen once and get written to .rdata files. In subsequent runs, you read the .rdata file and not run the query repeatedly. When your database changes, then you rerun steps one and two.
Maiasaura
I haven't used RMySQL, but RODBC and RJDBC and haven't found any speed problems. So maybe you could try those if RMySQL is slow.
Matti Pastell
+2  A: 

The examples section at the bottom of the help(sqldf) page in the sqldf package has quite a few SQL commands and their R counterparts.

G. Grothendieck
+1  A: 

I just started working with RMySQL recently and really like the package. I just run basic SQL queries in R itself. Most of the data re-arranging is done in several independent SQL scripts, basically some stored procedures.

I think R is a statistical package with some nice merging capability but it´s not meant to handle relational data that way. I do work a lot with micro data and have to set up non-relational datasets from these micro data (and then use R for regression analysis and plotting ggplot2 (!)) . I also do data aggregation in SQL itself before connecting to R.

I also recommend to use views (if they are fast enough for you). R accesses them like ordinary tables using the list tables statement.

Besides there´s RPostgreSQL out there, if you wanna give postgreSQL a try. I tried it once but switched to RMySQL because RPostgreSQL was so hard to setup on my Mac and after an update the config was gone. RMySQL was much easier. Back then I had to compile the package on my own, so if you run another OS, you might get a binary (or there´s a Mac OS one there by now) .

In any case there is some literature on RPostgreSQL out there that might help you even if you use RMySQL, particulary if you plan to use it for timeseries data (e.g. TSPostgreSQL).

ran2
Thanks for sharing Ran :)
Tal Galili