tags:

views:

126

answers:

2

We keep all our laboratory data in a Sybase database. When I want to do data manipulation and analysis I read the data into R with RODBC.

library(RODBC)
channellab <- odbcConnect("Labdata")
indivs <-sqlQuery(channellab,'SELECT * from CGS.Specimen')

So far so good, except that CGS.Specimen is a table for our entire lab holdings. There are almost 40000 specimens (rows) with 66 variables. It takes an unnecessary amount of time to read especially when I am only interested in the holdings of one study which has about 1000 specimens. The obvious way to fix it would be

indivs <-sqlQuery(channellab,'SELECT * from CGS.Specimen WHERE StudyCode="RP"')  

But when I run that I get

[1] "42S22 -143 [Sybase][ODBC Driver][Adaptive Server Anywhere]Column not found: Column 'RP' not found" "[RODBC] ERROR: Could not SQLExecDirect 'SELECT * from CGS.Specimen WHERE StudyCode=\"RP\"'"

I have played with quotation marks and I have played with escape slashes but alas I have gotten nowhere.

for instance

indivs <-sqlQuery(channellab,'SELECT * from CGS.Specimen WHERE StudyCode= 'RP'  ')

Error: unexpected symbol in "indivs <-sqlQuery(channellab,'SELECT * from CGS.Specimen WHERE StudyCode= 'RP"

indivs <-sqlQuery(channellab,"SELECT * from CGS.Specimen WHERE StudyCode= 'RP'  ")

Error in .Call(C_RODBCFetchRows, attr(channel, "handle_ptr"), max, buffsize, : negative length vectors are not allowed

What do you suggest?

+2  A: 

Edited:

indivs <-sqlQuery(channellab,'SELECT * from CGS.Specimen WHERE StudyCode=\'RP\'')  

Ok can you run another query where some data is returned?

WHERE StudyCode=StudyCode for example? Does this give the same error about -ve lengths?

this means that the above is correct for escaping the StudyCode. The -Ve lenght is another error. Can you return not all columns with * but specific columns where there are no nulls?

Preet Sangha
It did not work. I got Error: unexpected string constant in "indivs <-sqlQuery(channellab,'SELECT * from CGS.Specimen WHERE StudyCode=''RP'"
Farrel
Have updated the answer
Preet Sangha
It still does not work. Error in .Call(C_RODBCFetchRows, attr(channel, "handle_ptr"), max, buffsize, : negative length vectors are not allowed I do not have a clue what all that means but it does not work. What else can we try.
Farrel
I tried indivs <-sqlQuery(channellab,'SELECT * from CGS.Specimen WHERE StudyCode=StudyCode') and got Error in .Call(C_RODBCFetchRows, attr(channel, "handle_ptr"), max, buffsize, : negative length vectors are not allowed
Farrel
I do not know if I understood your additional question, "Can you return not * specific columns where there are no nulls?" You may have meant that I run a line such as indivs <-sqlQuery(channellab,'SELECT SubjLastName, SubjSex, StudyCode from CGS.Specimen WHERE StudyCode=\'RP\''). I still got an error. Error in .Call(C_RODBCFetchRows, attr(channel, "handle_ptr"), max, buffsize, : negative length vectors are not allowed
Farrel
Some real time voip or im may be in order. Do you use gmail or skype? If so I am fjbuch on both.
Farrel
+1  A: 

The last error you report is from the .Call function to the c function RODBCFetchRows. Thus your problem with the quotation marks appears to be solved. However, with the said error you still have a problem. Perhaps the rows are not correctly reported (the c function tries to allocate vectors based on this value, but does not check for negative values). Try:

indivs <-sqlQuery(channellab,"SELECT * from CGS.Specimen WHERE StudyCode= 'RP'", believeNRows = FALSE)
eyjo
I tried running your line. No joy. > indivs[1] "42000 -131 [Sybase][ODBC Driver][Adaptive Server Anywhere]Syntax error or access violation: near ',' in ...StudyCode= 'RP'[,] believeNRows =..."[2] "[RODBC] ERROR: Could not SQLExecDirect 'SELECT * from CGS.Specimen WHERE StudyCode= 'RP', believeNRows = FALSE'" > str(indivs) chr [1:2] "42000 -131 [Sybase][ODBC Driver][Adaptive Server Anywhere]Syntax error or access violation: near ',' in ...StudyCode= 'RP'[,] b"| __truncated__ ...
Farrel
Ooops, a little typo there ... the believeNRows = FALSE should not have been in the query string. Fixed it.
eyjo
Mazal Tov! It works. Why do you think the number of rows returned by the ODBC connection is messed up. I guess it does not matter as long as we get it to work.
Farrel
The manual talks of some Sybase drivers not reporting believable number of rows (same for some Oracle drivers and SQLight for Mac OS X). I don't know if the driver could be fixed (i.e. updated), but you might be able to increase performance by adjusting buffsize.
eyjo