tags:

views:

117

answers:

2

This is probably going to be an underspecified question, as I'm not looking for a specific fix:

I want to run a machine learning algorithm on some data in a MSSQL database. I'd like to use R to do the calculations -- which would involve using R to connect to the database, process the data, and write a table of results back to the database.

Is this possible? My guess is yes. Shouldn't be a problem using a client...

however, would it be possible to set this up on a linux box as a cron job?

A: 

You may just write a script containing R code and put this in the first line:

#!/usr/bin/env Rscript

change the file permissions to allow execution and put in crontab as it would be a bash script.

mbq
+2  A: 

Yes to all!

Your choices for scripting are either Rscript or littler as discussed in this previous post.

Having struggled with connecting to MSSQL databases from Linux, my recommendation is to use RJDBC for database connections to MSSQL. I used RODBC to connect from Windows but I was never able to get it working properly in Linux. To get RJDBC working you will need to have Java installed properly on your Linux box and may need to change some environment variables (seems I always have SOMETHING mis-configured with rJava). You will also need to download and install the JDBC drivers for Linux which you can get directly from Microsoft.

Once you get RJDBC installed and the drivers installed, the code for pulling data from the database will look something like the following template:

require(RJDBC)
drv <- JDBC("com.microsoft.sqlserver.jdbc.SQLServerDriver",
            "/etc/sqljdbc_2.0/sqljdbc4.jar")
conn <- dbConnect(drv, "jdbc:sqlserver://mySqlServer", "userId", "Password")
sqlText <- paste("
  SELECT  * 
  FROM SomeTable
       ;")
myData  <- dbGetQuery(conn, sqlText)

You can write a table with something like

dbWriteTable(conn, "myData", SomeTable, overwrite=TRUE)

When I do updates to my DB I generally use dbWriteTable() to create a temporary table on my database server then I issue a dbSendUpdate() that appends the temp table to my main table then a second dbSendUpdate() that drops the temporary table. You might find that pattern useful.

The only "gotcha" I ran into was that I could never get a Windows domain/username to work in the connection sequence. I had to set up an individual SQL Server account (like sa).

JD Long