ansaurus

Question

How to skip extra lines before the header of a tab delimited delimited file in R

Answer 1

+1 A:

You don't need to read twice. Use textConnection on first result.

read.parameters <- function(file.name, ...){
  lines <- scan(file.name, what="character", sep="\n") # you got "tmp.log" here, i suppose file.name should be
  first.line <- min(grep("\\t", lines))
  return(read.delim(textConnection(lines), skip=first.line-1, ...))
}

Marek 2010-06-16 12:30:35

I've fixed the typo. Thanks for the suggestion of using textConnection, although the function as you've given it doesn't work. I think I need to make the textConnection first, then run scan on it, then use pushBack to rewind the file.

Michael Dunn 2010-06-16 12:47:20

Strange. I test it and works for my fake data. You got an error message or an empty results?

Marek 2010-06-16 12:58:20

Example: `cat(c("ds","sdds","sddfsd","a\tb\tc","1\t2\t3","1\t2\t3"),file="test.txt", sep="\n")` then `read.parameters("test.txt")` return `data.frame` with 3 cols and 2 rows.

Marek 2010-06-16 13:03:40

Interesting. I can confirm it works with your fake data, but with my big data files R stops responding and I have to force quit. But inspired by your suggestion I've produced a working version (added in the question in order to preserve the formatting)

Michael Dunn 2010-06-16 13:15:17

Answer 2

A:

If you can be sure that the header info won't be more than N lines, e.g. N = 200, then try:

scan(..., nlines = N)

That way you won't re-read more than N lines.

G. Grothendieck 2010-06-16 21:58:10

That's a decent approach, but I can't really guarantee anything about the header size. I'm quite pleased with my function using a file pointer.

Michael Dunn 2010-06-17 08:26:15

ansaurus

tags:

views:

answers:

How to skip extra lines before the header of a tab delimited delimited file in R

related questions