views:

77

answers:

2

I'm using the following script to read in .txt files into R. For some reason, even though I only have 21 elements in the header it claims that I have 22. This makes the read table function bug because the rest of the lines of the file only have 21 items.

When I use the scan function I notice that my header actually starts at element 2 and element 1 is empty i.e. something like this. " "

It seems to me that the problem is that somehow something is being read into that first element when it shouldn't be. I don't understand this because the first line is a string with no spaces before it.

library(foreign)

setwd("/Library/A_Intel/")

filelist <-list.files()

#assuming tab separated values with a header    
datalist = lapply(filelist, function(xx)read.table(xx, header=T, sep=";")) 

#assuming the same header/columns for all files
datafr = do.call("rbind", datalist)  

Thanks!

EDIT 1

TIME ;POWER SOURCE ;qty MONITORS ;NUM PROCESSORS ;freq of CPU Mhz ;SCREEN SIZE ;CPU LOAD ;BATTERY LEVEL ; KEYBOARD MVT ; MOUSE MVT ;BATTERY MWH ;HARD DISK SPACE ;NUMBER PROCESSES ;RAM   ;FOCUS APP ;
2010-08-09-14:57:29.423 ; AC ; 1 ; 2 ; 1600 ; 1280 : 800  ; 0.434570 ; 100 ; NO ; NO ; 38119596 ; 66.388687  ;  65    ; 1446.54296875   ; Xcode-#6294  ; 
2010-08-09-14:57:30.422 ; AC ; 1 ; 2 ; 1600 ; 1280 : 800  ; 0.399414 ; 100 ; NO ; NO ; 38119596 ; 66.388687  ;  65    ; 1446.55859375   ; Xcode-#6294  ; 
2010-08-09-14:57:31.421 ; AC ; 1 ; 2 ; 1600 ; 1280 : 800  ; 0.399414 ; 100 ; NO ; YES ; 38119596 ; 66.388687  ;  65    ; 1446.9375   ; Xcode-#6294  ; 
2010-08-09-14:57:32.421 ; AC ; 1 ; 2 ; 1600 ; 1280 : 800  ; 0.399414 ; 100 ; NO ; YES ; 38119596 ; 66.388687  ;  65    ; 1446.875   ; Xcode-#6294  ; 
2010-08-09-14:57:33.421 ; AC ; 1 ; 2 ; 1600 ; 1280 : 800  ; 0.399414 ; 100 ; NO ; YES ; 38119596 ; 66.388695  ;  65    ; 1445.7890625   ; Xcode-#6294  ; 
2010-08-09-14:57:34.421 ; AC ; 1 ; 2 ; 1600 ; 1280 : 800  ; 0.399414 ; 100 ; NO ; YES ; 38119596 ; 66.388695  ;  65    ; 1444.84765625   ; Xcode-#6294  ; 

I Have been playing around with a single file and I still get the same problem. I did notice that when supposedly when you mark 'header="TRUE" the header needs to have one column less then the data. I thought this would help in that the first column would be considered the index, but instead the data frame has it marked in my first category i.e. the last value in the line doesn't have a column name.

Thanks again!

EDIT 2

and here is a typical error:

 Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
   line 1 did not have 16 elements
A: 

Open your data file in a text editor.

Check that you really have 21 elements in the header line.

Have you got accidental leading (or trailing) spaces (or a separator character)?

Have you got the correct separator (sep =) in read.table?

If you have field names that need escaping, has that been done? e.g. In a comma separated file, a field name of one, two would need to be written "one, two".

Richie Cotton
+4  A: 

The problem isn't your header - it is the # in the last field. It is being treated as a comment character. If you set comment.char = "" or something equivalent, it will work. I copied your data to a file called testdata.txt

read.table("testdata.txt", sep = ";", header=T, comment.char="")

This worked.

The error "line 1 did not have 16 elements" means that the header had 16 elements (it read all the way to the final semicolon) while the first line only had 15 (the terminal semicolon is being commented out).

Greg
Well done!!! I wish I could give multiple votes for those people who get to the heart of the problem so clearly and effectively. Thanks!
Eric Brotto