+3  A: 

What about writing it to some file and call LOAD DATA INFILE? This should at least give a benchmark. BTW: What kind of DBMS do you use?

ran2
I wasn't aware of "LOAD DATA INFILE" well it means that I first have to write the data in the file so I'm just moving the problem a bit.I use a relational database, but in truth this table has not relation so it is sort of a flat database
JSmaga
LOAD DATA INFILE just issues one query as opposed to one for every single line. So you might get quicker results and no timeouts if you use tools like phpmyadmin for DBadministration. In case you use MySQL or PostgreSQL there are nice packages out there like RMySQL and RPostgreSQL. There´s a Oracle package too. I have tested both of the SQL packages and I am happy with it. That being said, I really wonder why you have a problem because I already worked with several gb of data with these tools.
ran2
I'll try out the LOAD DATA INFILE it looks really efficient.I guess single queries are just not efficient enough.
JSmaga
+3  A: 

Instead of your sendToDB-function, you could use sqlSave. Internally it uses a prepared insert-statement, which should be faster than individual inserts.

However, on a windows-platform using MS SQL, I use a separate function which first writes my dataframe to a csv-file and next calls the bcp bulk loader. In my case this is a lot faster than sqlSave.

Henrico
+1  A: 

There's a HUGE, relatively speaking, overhead in your sendToDB() function. That function has to negotiate an ODBC connection, send a single row of data, and then close the connection for each and every item in your list. If you are using rodbc it's more efficient to use sqlSave() to copy an entire data frame over as a table. In my experience I've found some databases (SQL Server, for example) to still be pretty slow with sqlSave() over latent networks. In those cases I export from R into a CSV and use a bulk loader to load the files into the DB. I have an external script set up that I call with a system() call to run the bulk loader. That way the load is happening outside of R but my R script is running the show.

JD Long
Yeah in truth, my function is defined on the fly in the `lapply` call, and the channel is opened before the `lapply` and closed after. But I'll have a look at sqlSave then since I am on MySQL.
JSmaga
by the way, even if I didn't, isn't connection pooling reducing the overhaed a lot?
JSmaga
using `sqlSave` requires to transfer the data into a `data.frame`. My dataset is at the moment stocked in an XTS. When I try `test<-data.frame(myXTS)` R shuts down without any kind of warning. I works for smaller XTS though.
JSmaga
I'm pretty unexperienced with XTS objects so I'm not much help there. Another question on that, maybe? I may be missing something obvious, but where's the connection pooling coming from?
JD Long