ansaurus

Question

Answer 1

+2 A:

I also need to be able to directly call the database from R

I suggest setting up MySQL with RMySQL interface.

Once the DB connection is open, you can query the database and get the the data into R, example:

# Run an SQL statement by creating first a resultSet object
rs <- dbSendQuery(con, statement = paste(
                      "SELECT w.laser_id, w.wavelength, p.cut_off",
                      "FROM WL w, PURGE P",
                      "WHERE w.laser_id = p.laser_id",
                      "SORT BY w.laser_id")
# we now fetch records from the resultSet into a data.frame
data <- fetch(rs, n = -1)   # extract all rows

RMySQL: R interface to the MySQL database

Database interface and MySQL driver for R. This version complies with the database interface definition as implemented in the package DBI 0.2-2.

MySQL Database:

Available for all the platforms you cited in the question, and more, download here.

Bakkal 2010-07-18 13:11:19

For a single user, RSQLite is easier to setup and quite fast.

Eduardo Leoni 2010-07-18 14:24:34

Thank you for the code examples Bakkal. This will be used for a single user. Is there an advantage to using MySQL rather than SQLite for this application?

ProbablePattern 2010-07-18 15:40:08

You can use the simplest DB for the current task, and then move to a more complete one later if you are missing a useful feature :)

Bakkal 2010-07-18 16:02:24

It would be the same code for RSQLite -- see my answer regarding the DBI interface both packages use.

Dirk Eddelbuettel 2010-07-18 16:12:19

Nthing MySQL. I use R and MySQL every day for my simulation work. It may seem like a lot of work at first but the learning curve is entirely worth it in the long run as the size of your data and complexity are bound to grow over time.

Maiasaura 2010-07-18 18:54:54

Answer 2

+3 A:

Quick comments:

R is good at this, as a language for programming with data, there are plenty of interfaces
There is an entire manual devoted to data import/export, and it has a section on relational databases, so start there.
R has the widely-used DBI package which provides a unified interface for many backends, among them SQLite, MySQL, PostgreSQL, Oracle, ... Use that, maybe with RSQLite to get something going quickly. You can still switch backends afterwards.
There is also RODBC but I find ODBC tedious to work with.
R also has a specialised variant in the TSdbi package by Paul Gilbert which brings the DBI-alike abstraction to timeseries databases. It also supports multiple backends.
The data.table package was written for this and is very fast on indexing.

Dirk Eddelbuettel 2010-07-18 13:30:29

Thank you for the resources. So far, SQLite looks like the most appropriate for my skill level.

ProbablePattern 2010-07-18 15:36:19

Answer 3

+1 A:

Do you really need a database solution for your purpose? You say you want a "solution for storing data at intermediate steps " -- how about simply saving the data array to disk at the required time points?

Edit: to make it possible to retrieve the information, you can embed meta-information, e.g. trial index and/or timestamp, in the filename. Then later you can locate and load the file using the correct filename.

Amnon 2010-07-18 13:36:47

I'm relatively new to R but it seems like this would generate several hundred files and make it difficult to audit when I'm done. Each run of the simulation requires creating a few three dimensional arrays and I can only run about 5,000 simulations at once. My current goal is to do 50,000 runs. If there is a better way of storing three dimensional arrays to disk and reading their results later in a systematic way, I would certainly appreciate the guidance.

ProbablePattern 2010-07-18 15:34:49

Answer 4

+1 A:

You can also take a look at the ff package.

Jyotirmoy Bhattacharya 2010-07-18 16:36:45

ansaurus

tags:

views:

answers:

Recommendations for database with R

related questions