tags:

views:

168

answers:

2

I am loading a table in which the first column is a URL and reading it into R using read.table(). It seems that R is dropping about 1/3 of the columns and does not return any errors. The URLs do not contain any # characters or tabs (my separator field), which I understand could be an issue. If I convert the URLs to integer IDs first, the problem goes away. Is there something about the field that might be causing R to drop the rows?

+1  A: 

Without sample of the data it's hard to say, but one small gotcha is that "#" is a comment.char in read.table(), try to set comment.char = "" and see if that fixes it.

kari
+2  A: 

Thanks for all your help,

Yes, so initially there were some hashes and I was able to handle them using comment.char = ''. The problem turned out to be that some of my URLs contained ' and " characters. The strangest thing about the situation is that it didn't return any errors. After I removed these characters using tr, I had no issues with loading the data.