Is there a table comparing SQL commands with R commands ?

+2 A:

You could look at Joshua Reich's presentation on R and SQL (see page 11).

Shane 2010-07-03 13:19:30

Thanks Shane - exactly what I was looking for. Should I guess this is the biggest table I will find?

Tal Galili 2010-07-03 15:01:23

I'm not sure...that's the only one that I'm aware of.

Shane 2010-07-03 15:18:17

+3 A:

sqldf package could be of some help here perhaps?

There is also a talk from Joshua accompanying presentation that Shane mentioned above.

radek 2010-07-03 17:33:06

Thanks Radek - that package actually gave me the idea for the question. Interesting how no one made such a thing. Maybe this should be some sort of R community project (that is - to take a bunch of SQL tasks and that all of us will compile all the ways to do them in R - something like this maybe: http://rosettacode.org/wiki/Category:Database_operations).

Tal Galili 2010-07-03 19:41:30

For me that would be a bliss, since I feel much more comfortable managing data with SQL [so far]. Thanks for rosetta link - interesting.

radek 2010-07-05 20:04:13

+1 A:

It's also worth looking into the RMysQL package.

I work with very large datasets that cannot be dumped into text prior to importing in R. This package allows me to use standard mysql queries from within R to pull in subsets of my data.

Maiasaura 2010-07-03 20:17:03

Thank you. I played with it about two years ago. I remember that the connection time was very long. Is this still an issue today ?

Tal Galili 2010-07-03 21:12:53

I find RMySQL operations to be very slow, compared with native queries or wrappers for other languages.

neilfws 2010-07-04 01:54:54

I find it reasonably fast. Although I must emphasize that it is not something you should do repetitively. It's good to have a workflow (http://stackoverflow.com/questions/1429907/workflow-for-statistical-analysis-and-report-writing/1434424#1434424) so queries only happen once and get written to .rdata files. In subsequent runs, you read the .rdata file and not run the query repeatedly. When your database changes, then you rerun steps one and two.

Maiasaura 2010-07-04 05:33:14

I haven't used RMySQL, but RODBC and RJDBC and haven't found any speed problems. So maybe you could try those if RMySQL is slow.

Matti Pastell 2010-07-04 10:44:33

+2 A:

The examples section at the bottom of the help(sqldf) page in the sqldf package has quite a few SQL commands and their R counterparts.

G. Grothendieck 2010-07-04 01:45:26

+1 A:

I just started working with RMySQL recently and really like the package. I just run basic SQL queries in R itself. Most of the data re-arranging is done in several independent SQL scripts, basically some stored procedures.

I think R is a statistical package with some nice merging capability but it´s not meant to handle relational data that way. I do work a lot with micro data and have to set up non-relational datasets from these micro data (and then use R for regression analysis and plotting ggplot2 (!)) . I also do data aggregation in SQL itself before connecting to R.

I also recommend to use views (if they are fast enough for you). R accesses them like ordinary tables using the list tables statement.

Besides there´s RPostgreSQL out there, if you wanna give postgreSQL a try. I tried it once but switched to RMySQL because RPostgreSQL was so hard to setup on my Mac and after an update the config was gone. RMySQL was much easier. Back then I had to compile the package on my own, so if you run another OS, you might get a binary (or there´s a Mac OS one there by now) .

In any case there is some literature on RPostgreSQL out there that might help you even if you use RMySQL, particulary if you plan to use it for timeseries data (e.g. TSPostgreSQL).

ran2 2010-07-04 14:51:24

Thanks for sharing Ran :)

Tal Galili 2010-07-04 16:58:18

ansaurus

tags:

views:

answers:

Is there a table comparing SQL commands with R commands ?

related questions