tags:

views:

107

answers:

3

I'm used to perl and new to R. I know you can read whole tables using read.table() but I wonder how can I use R to parse a single line from an input file.

Specifically, what is the equivalent to the following perl snippet:

open my $fh, $filename or die 'can't open file $filename';
my $line = <$fh>;
my ($first, $second, $third) = split ("\t", $line);
A: 

In general, you should use scan to do this, or in more complex cases read the whole file with readLines and parse it manually with strsplits, greps and stuff.

In your case:

scan(filename,character(0),nmax=3)->d
first<-d[1];d[2]->second;third<-d[3]
mbq
+1  A: 

Similar to the above would be:

filename <- 'your/file/name/here'
fh <- file( filename, open='rt' )
line <- readLines(fh, n=1 )
tmp <- strsplit(line, "\\t")
first <- tmp[[1]][1]; second <- tmp[[1]][2]; third <- tmp[[1]][3]

The file function creates a connection to the file and opens it, the opening is optional, but if you don't open the file then when you read from it it will open then close the file again, if you open the file then it remains open and the next read continues from where the previous left on (closest match to what Perl would be doing above).

The readLines function will read the specified number of lines (1 in this case) then strsplit works basically the same as the Perl split function.

R does not have the multiple assign like Perl (it is often best to just keep the results together anyways rather than splitting into multiple global variables).

Greg Snow
One warning -- this would load the whole file into memory and split all its lines. If the file is huge and you need only first three elements, this is certainly not a good idea.
mbq
For a small file it may read the whole thing, but for larger files it will only read part into memory, as you continue to read from the file it will grab additional chunks.
Greg Snow
A: 

Just to show another way to do it (assuming your input is "temp/3.txt"):

> d <- read.csv("temp/3.txt", sep="\t", stringsAsFactors=F, header=F, nrows=1)
# Show the default column names:
> colnames(d)
[1] "V1" "V2" "V3"
# Assign the requested column names
> colnames(d) <- c("first", "second", "third")
# Show the current structure of d
> d
  first second third
1     1      2     3
# Probably not recommended: Add the columns of d to the search path
> attach(d)
> first
[1] 1
# Clean up:
> detach(d)

I guess the most important part above in terms of addressing your question is just

nrows=1

which tells it to parse one row of input. (Underneath read.csv eventually just calls down to scan.)

David F