tags:

views:

55

answers:

2

I have a dataframe with millions of rows and three columns labeled Keywords, Impressions, Clicks. I'd like to add a column with values depending on the evaluation of this function:

isType <- function(Impressions, Clicks)
{ 
if (Impressions >= 1 & Clicks >= 1){return("HasClicks")} else if (Impressions >=1 & Clicks == 0){return("NoClicks")} else {return("ZeroImp")}
}

so far so good. I then try this to create the column but 1) it takes for ever and 2) it marks all the rows has "HasClicks" even the ones where it shouldn't.

# Creates a dataframe
Type <- data.frame()
# Loops until last row and store it in data.frame
for (i in c(1:dim(Mydf)[1])) {Type <- rbind(Type,isType(Mydf$Impressions[i], Mydf$Clicks[i]))}
# Add the column to Mydf
Mydf <- transform(Mydf, Type = Type)

input data:

Keywords,Impressions,Clicks
"Hello",0,0
"World",1,0
"R",34,23

Wanted output:

Keywords,Impressions,Clicks,Type
"Hello",0,0,"ZeroImp"
"World",1,0,"NoClicks"
"R",34,23,"HasClicks"

A: 

First, the if/else block in your function will return the warning:

Warning message:
In if (1:2 > 2:3) TRUE else FALSE :
the condition has length > 1 and only the first element will be used

which explains why it all the rows are the same.

Second, you should allocate your data.frame and fill in the elements rather than repeatedly combining objects together. I imagine this is causing your long run-times.

EDIT: My shared code. I'd love for someone to provide a more elegant solution.

Mydf <- data.frame(
  Keywords = sample(c("Hello","World","R"),20,TRUE),
  Impressions = sample(0:3,20,TRUE),
  Clicks = sample(0:3,20,TRUE) )

Mydf$Type <- "ZeroImp"
Mydf$Type <- ifelse(Mydf$Impressions >= 1 & Mydf$Clicks >= 1,
  "HasClicks", Mydf$Type)
Mydf$Type <- ifelse(Mydf$Impressions >= 1 & Mydf$Clicks == 0,
  "NoClicks", Mydf$Type)
Joshua Ulrich
Could you share some code?
datayoda
+2  A: 

Building on Joshua's solution, I find it cleaner to generate Type in a single shot (note however that this presumes Clicks >= 0...)

Mydf$Type = ifelse(Mydf$Impressions >= 1,
    ifelse(Mydf$Clicks >= 1, 'HasClicks', 'NoClicks'), 'ZeroImp')
Charles
Thank you! I got to the same conclusion that ifelse needed to be used. I'm too new to R to understand why but thanks!
datayoda
Gavin Simpson