tags:

views:

75

answers:

2

I am trying to read in some data that is is a text file that looks like this:

2009-08-09 - 2009-08-15 0   2   0
2009-08-16 - 2009-08-22 0   1   0
2009-08-23 - 2009-08-29 0   1   0
2009-08-30 - 2009-09-05 0   1   0
2009-09-06 - 2009-09-12 0   1   0
2009-09-13 - 2009-09-19 0   1   0
2009-09-20 - 2009-09-26 0   1   0
2009-09-27 - 2009-10-03 0   1   0 

I have tried using this commmand

test <- read.table('test', sep ="\t")

as well as lots of different variations on that theme. But all I ever get back is this:

   V1
1  ÿþ2
2     
3     
4     
5     
6     
7     
8     
9     
10    
11    
12    
13    
14    
15    
16  

whereas I want a four columned data frame.

Any ideas where I am going wrong?

+3  A: 

The file you are reading is probably using some encoding other than ASCII. ?read.table shows

 read.table(file, header = FALSE, sep = "", quote = "\"'",
            ... 
            fileEncoding = "", encoding = "unknown")

fileEncoding: character string: if non-empty declares the encoding used
          on a file (not a connection) so the character data can be
          re-encoded.  See 'file'. 

So perhaps try setting the fileEncoding parameter. If you don't know the encoding, perhaps try "utf-8" or "cp-1252". If that does not work, then if you pastebin a snippet of your actual file, we may be able to identify the encoding.

unutbu
yes - i had thought that might be the problem and had tried utf-8 and cp-1252 whhic hadn't worked. but investigated more and it was "utf-16". now it works.Thanks!
Tom Liptrot
+1  A: 

Your separator could be spaces rather than tabs. If you leave the sep argument as "", it will use any kind of white space.

EDIT: Actually, the encoding does sound more likely as the source of the problem.

Read in the file with readLines, then check the encoding with Encoding.

Richie Cotton