tags:

views:

468

answers:

2

I understand that some databases have native support in R (e.g. MySQL) but you can connect to other DBs like MS SQL Server using RODBC. How much speed improvement does one gain for reading/writing with the native drivers vs. RODBC? What other DBs have native drivers in R? Is reading faster or slower than writing generally?

+1  A: 
  • It's an empirical question, so why don't measure it for the combination you are interested in?
  • Public code is not hidden, so why don't you count what other DB interfaces CRAN has? For DBI alone, we have SQLite, MySQL, Postgresql, Oracle; for custom db backends there are things like Vhayu.
  • Specialised forums exist, so why don't you ask on r-sig-db?
  • Lastly, as soon as there is an API and a need people tend to combine the two. I have written two different (at-work and hence unreleased) packages to two highly specialised and fast backends.
Dirk Eddelbuettel
Good point on the empirical question but I can't test them all. If someone else has good experience and evidence for switching DB engines I would switch.
JD Long
Well you may have an option of comparing native to ODBC (say, with MySQL or PostgreSQL; rather than MS SQL where it is ODBC only). So simulate the type of test case you are after and see how it behaves.
Dirk Eddelbuettel
+1  A: 

If you're specifically interested in SQL Server, the reference below is a little bit out of date but I imagine it probably still holds.

Using ODBC with Microsoft SQL Server

Performance of ODBC as a Native API

One of the persistent rumors about ODBC is that it is inherently slower than a native DBMS API. This reasoning is based on the assumption that ODBC drivers must be implemented as an extra layer over a native DBMS API, translating the ODBC statements coming from the application into the native DBMS API functions and SQL syntax. This translation effort adds extra processing compared with having the application call directly to the native API. This assumption is true for some ODBC drivers implemented over a native DBMS API, but the Microsoft SQL Server ODBC driver is not implemented this way.

The Microsoft SQL Server ODBC driver is a functional replacement of DB-Library. The SQL Server ODBC driver works with the underlying Net-Libraries in exactly the same manner as the DB-Library DLL. The Microsoft SQL Server ODBC driver has no dependence on the DB-Library DLL, and the driver will function correctly if DB-Library is not even present on the client.

Microsoft's testing has shown that the performance of ODBC-based and DB-Library–based SQL Server applications is roughly equal.

Bob Albright
That's a really good reference to have. Thanks! Unfortunately I also have to deal with the R side of things. It seems that going from R to the RODBC layer is particularly slow for writes. But it's good to know the slowdown is probably in RODBC and not the ODBC to SQL Server layer. Thanks again.
JD Long
Out of curiosity, approximately how large are your dataframes that you are saving? Have you tried profiling to see if you are inserting data 1 row at a time or in batch? If you are inserting 1 row a time that would slow you down a lot.I've also recently run into a few issues with the saving functions in RODBC. sqlQuery() when running only an INSERT/UPDATE throws an error and works, at least for SQL Server.
Bob Albright