ansaurus

Question

Answer 1

A:

When you subset your data the factors that existed in the original data set persist. Take the diamonds data set for example. You have 5 different cuts.

unique(diamonds$cut) ## Ideal, Premium, Good, Very Good, Fair

If we subset diamonds, we get:

str(subset(diamonds, cut == "Ideal")) ## Look at structure

In str(), we see that cut maintains the factors that it had originally.

$ cut    : Factor w/ 5 levels "Fair","Good",..: 5 5 5 5 5 5 5 5 5 5 ...

Even though we've removed all the other categories of cut, the factoring persists.

You can remove the extra factors by refactoring the column with it's own unique subsetted factors.

x$cut <- factor(x$cut, labels=unique(x$cut))

Now looking more specifically at your example:

test <- ddply(big_sales_df, .(ITEM), "nrow")
test$ITEM <- factor(test$ITEM, labels=unique(test$ITEM))

Now, try your plot again.

Brandon Bertelsen 2010-10-21 21:36:08

Many thanks Brandon, that worked just fine!

Atish 2010-10-21 22:23:23

I posted another answer that does the same thing, but in situ in the plot. IN qplot() replace ITEM with factor(ITEM)

Brandon Bertelsen 2010-10-21 22:25:49

Answer 2

+1 A:

You need to remove the factor levels that were dropped from your subset.

big_sales_df$ITEM <- factor(big_sales_df$ITEM)
big_sales_df$CUST <- factor(big_sales_df$CUST)

OR change how you read in the data:

sales_df <- read.csv("ItemsSold.csv", header=TRUE, stringsAsFactors=FALSE)

Joshua Ulrich 2010-10-21 21:41:55

Thanks Joshua, your approach worked too!

Atish 2010-10-21 22:26:21

Answer 3

+1 A:

Or you can cheat by factoring item:

qplot(nrow, factor(ITEM), data = ddply(big_sales_df, .(ITEM), "nrow")

Brandon Bertelsen 2010-10-21 22:24:37

Thanks again, Brandon.

Atish 2010-10-21 23:48:15

Oops, I meant to add that there was a slight typo in your answer - it should be:qplot(nrow, factor(ITEM), data = ddply(big_sales_df, .(ITEM), "nrow"))

Atish 2010-10-21 23:48:40

And I like this solution better as it gives me more control, in case I want to plot by several dimensions.

Atish 2010-10-21 23:49:36

@Atish: if you like this solution, you should make it the accepted answer.

Joshua Ulrich 2010-10-22 15:05:44

ansaurus

tags:

views:

answers:

Unwanted items on a plot

related questions