tags:

views:

162

answers:

2

I'm having trouble with what I think is a basic R task.

Here's my sample dataframe named 'b'

Winner Color Size
Tom Yellow Med
Jerry Yellow Lar
Jane Blue Med

where items in the Winner column are factors.

I'm trying to change "Tom" in the dataframe to "Tom LLC" and I can't get it done.

Here's what I tried:

Simple way: b$winner[b$winner=='Tom'] = as.factor('Tom LLC')

but that failed with "invalid factor level, NAs generated"

Next I tried a more advanced route:

name_reset = function (x, y, z) {
if (x$winner == y) {x$winner = z}
}

b = adply(b,1,name_reset,'Tom','Tom LLC')

but that failed with "Error in list_to_dataframe(res, attr(.data, "split_labels")) : Results are not equal lengths"

I feel I'm missing something basic. Can someone redirect me or offer suggestions on the code I wrote above? Thank you very much

+6  A: 

What you want to do is change the values via levels. Levels gives you access to the labels in a factor. Calling it on a factor shows the labels, and assigning to the levels function overwrites the labels for the factor.

Once you start working with the levels function you can change the values however you want. I think gsub is probably the easiest.

Try this:

levels(b$Winner) <- gsub("Tom", "Tom LLC", levels(b$Winner))

-mcpeterson

mcpeterson
Thank you very much! I greatly appreciate it.
rhh
+3  A: 

I made your data frame and then used dput() to make it into a format that will let people easily copy/paste it from the web:

b <- structure(list(Winner = c("Tom", "Jerry", "Jane"), Color = c("Yellow", 
"Yellow", "Blue"), Size = c("Med", "Lar", "Med")), .Names = c("Winner", 
"Color", "Size"), row.names = c(NA, -3L), class = "data.frame")

I'm not sure what exactly the as.factor() in your code is meant to do. as.factor converts vectors of values into factors - it doesn't really do anything meaningful for a single value. If b$Winner is a character vector, this works:

b$Winner[dat$Winner %in% "Tom"] <- "Tom LLC"

If b$Winner is a factor, then "Tom LLC" has to be one of the levels in order for you to insert it into the factor. If b$Winner is a factor, I'd probably then do this:

levels(b$Winner) <- c("Tom LLC", "Jerry", "Jane")

which is just telling R that the possible values of Winner (i.e., the levels of b$Winner) should be replaced. Some of the advanced R users here suggest setting your stringsAsFactors option to FALSE... and the more I use R, the more I agree. It's a lot easier to manipulate plain string vectors and then pop it into a factor as needed.

Matt Parker
Seconded on setting stringsAsFactors.
mcpeterson
Thank you for the help. I marked mcPeterson's as the answer since it was directly what I needed though I learned a bunch from reading your explanation. The suggestion of "stringsAsFactors = FALSE" is going to save me a ton of time. Thanks again for the help
rhh