ansaurus

Question

Answer 1

+2 A:

Yes, this is possible using eg MySQL linked to R with the RMySQL and DBI package, or via the RODBC or RJDBC package. I'm not 100% sure if they all support blobs, but worst case scenario you could use the ascii representation and put them in a text field.

The trick is using the function serialize()

> x <- rnorm(100)
> y <- 5*x+4+rnorm(100,0,0.3)
> tt <- lm(y~x)
> obj <- serialize(tt,NULL,ascii=T)

Now you can store or retrieve obj in a database. It's actually no more than a vector of ascii (or binary) codes. ascii=F gives you a binary representation. After retrieving it, you use :

> unserialize(obj)
Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
      4.033        4.992

Edit : regarding the pmml, there's a pmml package on CRAN. Maybe that one gets you somewhere?

Joris Meys 2010-10-17 22:21:34

Even if blobs are not supported, you can serialize/unserialize to and from ascii (as you even do in your example) and then store the ascii string.

Dirk Eddelbuettel 2010-10-17 23:23:14

I thought I said so? did I word it wrong?

Joris Meys 2010-10-17 23:25:23

Note that the ascii serialized obj is 16k bytes big (length(obj)), the binary version is 11k bytes big, but if you save("tt",file="tt.RData") you get something only 5k big.

Spacedman 2010-10-18 07:41:17

Thanks a ton. I was looking for something similar.

harshsinghal 2010-10-18 21:09:34

Answer 2

+1 A:

R can serialize and deserialize any object, that is how my digest package creates so-called 'hash digests' by running a hash function over the serialized object.

So once you have the serialized object (which can be serialized to character), store it. Any relational database will support this, as will the NoSQL key/value stores -- and for either backend you could even use the 'hash digest' as a key, or some other meta-information.

Other alternatives are for example RProtoBuf which can also serialize and de-serialize very efficiently (but you'd have to write the .proto files first).

Dirk Eddelbuettel 2010-10-17 22:34:32

The NoSQL idea seems appealing. The new Tokyo Cabinet package in R could help here.

harshsinghal 2010-10-18 08:31:49

Answer 3

A:

Note that a .RData file can contain many R objects, so you need to decide how to deal with that. If you attach the .RData file you can get the objects in it with ls() with a pos argument:

> attach("three.RData")
> ls(pos=2)
[1] "x" "y" "z"

then you can loop over them, get() them by name from the position, and serialize them to a list (p is my list index)

> s=list()
> p=1
>  for(obn in obnames){
+ s[[p]] = serialize(get(obn,pos=2),NULL,ascii=TRUE)
+ p=p+1
+ }

Now you'll have to squirt the elements of s to your DB, probably in a table of Name (some kind of char) and Value (the serialized data, a BLOB or varchar I guess).

Spacedman 2010-10-18 09:21:44

Answer 4

A:

As others have mentioned, yes you can store the outputs from models as text in your database. I'm not convinced that that wll be very useful to you though.

If you want to be able to recreate those models at a later date, then you need to store the input dataset and code that created the models, rather than the output.

Of course, you could store the model output as well, in which case you need to think about its format in the database. If you want to be able to find particular model results and filter or order them, then it will be much easier if you add them to the database with some structure (and some metadata).

For example, you might want to retrieve all models where there was a significant gender response. In that case you need to add that information as a separate field in the database rather than having to search through the chunks of ascii. Adding other information like the model creator and date of creation will also help you later on.

Richie Cotton 2010-10-18 15:36:42

You seem to have touched upon all aspects of my problem. I am trying to create a way to "markup" the independent variables in a glm model object, and if some variables where derived from source data columns (and their transformations). Currently, I save the model and the R script that went into creating it, but I want to create a more generic structure for re-tracing the path from data to model object.

harshsinghal 2010-10-18 21:13:30

ansaurus

tags:

views:

answers:

Serializing .RData file to database

related questions