I think I'm not asking the right question to begin with.
New Question: I have a 1.5gig tsv file. It has 6 lines of junk at the top and one line of junk at the bottom, all of which I want to remove without having to open the file. Line 7 are the headers. I have 13 headers. Number of rows is unknown.
How do I read the file into a dataframe so that I can do basic descriptive stats, boxplots, etc....
Original Question:
Hi
I have a feeling this one is really easy. I'm just missing something.
I have a txt file, tab separated, with 6 lines of junk at the top and a junk line at the very bottom as well. In between the junk I have data of form Label1 Label2 Label3 Label4....Label13 text ID number percent....number
Here is what I enter in R:
datadump <- read.delim2("truncate.txt", header=TRUE, skip="6")
cleandata <- datadump[c(-dim(datadump)[1]),]
avgposition <- cleandata$Avg.Position
hist(avgposition)
Avg.Position is label13 and a number of form #.#
Yet I get an error: Error in hist.default(avgposition) : 'x' must be numeric
Why is it not seeing the data as numeric?
Thanks!
As requested here is some data:
> dput(cleandata)
structure(list(Account = structure(c(2L, 2L), .Label = c("Crap1",
"XXS"), class = "factor"), Campaign = structure(c(1L, 1L), .Label = c("3098012",
"Crap2"), class = "factor"), Customer.Id = structure(c(2L, 2L
), .Label = c("", "nontech broad (7)"), class = "factor"), Ad.Group = structure(c(2L,
2L), .Label = c("", "RR 236 (300)"), class = "factor"), Keyword = structure(2:3, .Label = c("",
"chagall pro", "matisse"), class = "factor"), Keyword.Matching = structure(c(2L,
2L), .Label = c("", "Broad"), class = "factor"), Impressions = c(4L,
16L), Clicks = c(1L, 1L), CTR = structure(2:3, .Label = c("",
"25.00%", "6.25%"), class = "factor"), Avg.CPC = structure(2:3, .Label = c("",
"$0.05 ", "$0.11 "), class = "factor"), Avg.CPM = structure(2:3, .Label = c("",
"$12.50 ", "$6.88 "), class = "factor"), Cost = structure(2:3, .Label = c("",
"$0.05 ", "$0.11 "), class = "factor"), Avg.Position = structure(2:3, .Label = c("",
"3", "3.1"), class = "factor")), .Names = c("Account", "Campaign",
"Customer.Id", "Ad.Group", "Keyword", "Keyword.Matching", "Impressions",
"Clicks", "CTR", "Avg.CPC", "Avg.CPM", "Cost", "Avg.Position"
), row.names = 1:2, class = "data.frame")