tags:

views:

273

answers:

3

I'm trying to read a text file into R so I can use the sqldf functions. I'm following this example, https://stat.ethz.ch/pipermail/r-help/2008-January/152040.html, but I have a text file holding my data instead of the data being pasted as the example has it. My text file is below:

#"test.table.1.0" file has this contents:
id  Source
1     A10
2     A32
3     A10
4     A25

I tried this following the example

test_table <- read.table(textConnection("test.table.1.0"))

I can see that the problem is that textConnection is supposed to take a character vector, and I'm giving it a data.frame, but converting it via as.character also fails. Ultimately, I want to run a query like this:

sqldf("select test_table.source from test_table");
+3  A: 

Aniko's comment has almost all you need (along with header=TRUE):

R> data <- read.table("test.table.1.0", header=TRUE)
R> data
  id Source
1  1    A10
2  2    A32
3  3    A10
4  4    A25
R> 

In other words, if you have the data in a file, read from a file. A textConnection is useful if and when you have the data 'right there' along with the command as in the email you referenced.

Dirk Eddelbuettel
Thanks I didn't realize the function of textConnection. I also had an extra comment in my SQL, now removed.
John
+2  A: 

One can go directly into SQLITE using read.csv.sql() OR read.csv2.sql() from the sqldf package.

From the online manual:

Link

Example 13. read.csv.sql and read.csv2.sql read.csv.sql is an interface to sqldf that works like read.csv in R except that it also provides an sql= argument and not all of the other arguments of read.csv are supported. It uses (1) SQLite's import facility via RSQLite to read the input file into a temporary disk-based SQLite database which is created on the fly. (2) Then it uses the provided SQL statement to read the table so created into R. As the first step imports the data directly into SQLite without going through R it can handle larger files than R itself can handle as long as the SQL statement filters it to a size that R can handle. Here is Example 6c redone using this facility:

# Example 13a. 
library(sqldf) 

write.table(iris, "iris.csv", sep = ",", quote = FALSE, row.names = FALSE) 
iris.csv <- read.csv.sql("iris.csv",  
        sql = "select * from file where Sepal_Length > 5") 

# Example 13b.  read.csv2.sql.  Commas are decimals and ; is sep. 

library(sqldf) 
Lines <- "Sepal.Length;Sepal.Width;Petal.Length;Petal.Width;Species 
5,1;3,5;1,4;0,2;setosa 
4,9;3;1,4;0,2;setosa 
4,7;3,2;1,3;0,2;setosa 
4,6;3,1;1,5;0,2;setosa 
" 
cat(Lines, file = "iris2.csv") 

iris.csv2 <- read.csv2.sql("iris2.csv", sql = "select * from file where Sepal_Length > 5") 
Jay
+1  A: 

If you're data is not all that big, read.table() works great. If you have gigs of data you may find read.table or read.csv to be a little slow. In that case you can read data directly into sqlite from R using the sqldf package. Here's an example:

library(sqldf)
f <- file(“test.table.1.0”)
bigdf <- sqldf(“select * from f”, dbname = tempfile(),
   file.format = list(header = T, row.names = F))

A few months ago I wrote a personal anecdote about my experience using this method.

In my experience pulling data directly into sqlite is a LOT faster than reading it into R. But it's not worth the extra code if a simple read.csv() or read.table() works well for you.

JD Long