ansaurus

Question

Answer 1

A:

Here's one way using split and lapply:

> tmp <- lapply(split(d, list(d$date)), function(x) if(all(c('P', 'C') %in% x[, 5])) x)
> do.call(rbind, tmp)
             myIdx strike_price       date     exdate cp_flag strike_price    return
1996-05-18.1 8355342       605000 1996-04-02 1996-05-18       P       605000  0.002340
1996-05-18.2 8355433       605000 1996-04-02 1996-05-18       C       605000  0.002340
1996-05-18.3 8356541       605000 1996-04-09 1996-05-18       P       605000 -0.003182
1996-05-18.4 8356629       605000 1996-04-09 1996-05-18       C       605000 -0.003182
1996-05-18.5 8358033       605000 1996-04-16 1996-05-18       P       605000  0.003907
1996-05-18.6 8358119       605000 1996-04-16 1996-05-18       C       605000  0.003907
1996-05-18.7 8359391       605000 1996-04-23 1996-05-18       P       605000  0.005695

Edit: Here's the full version implied by my last answer. I tend to think in base functions rather than plyr or reshape... but these answers seem good too.

Vince 2010-08-06 04:22:48

I must be taking crazy pills... `lapply` + `split` is better done with just `tapply`. But wch's solution seems *much* cleaner.

Vince 2010-08-06 07:04:16

Answer 2

+1 A:

Using the plyr package:

> ddply(chData, "date", function(x) if(all(c("P","C") %in% x$cp_flag)) x)
    myIdx strike_price       date     exdate cp_flag strike_price.1    return
1 8355342       605000 1996-04-02 1996-05-18       P         605000  0.002340
2 8355433       605000 1996-04-02 1996-05-18       C         605000  0.002340
3 8356541       605000 1996-04-09 1996-05-18       P         605000 -0.003182
4 8356629       605000 1996-04-09 1996-05-18       C         605000 -0.003182
5 8358033       605000 1996-04-16 1996-05-18       P         605000  0.003907
6 8358119       605000 1996-04-16 1996-05-18       C         605000  0.003907

Joshua Ulrich 2010-08-06 04:24:17

This language keeps getting core cryptic and non-intuitive the more I read about it. What's a ddply plyr?

Karl 2010-08-06 04:27:00

@Karl, that's a package, not the "core" language.

Vince 2010-08-06 04:27:43

It just looks cryptic because of the function in there. `plyr` and its functions really *are* wonderful.

JoFrhwld 2010-08-06 05:01:12

Answer 3

+7 A:

Get the dates that have P's and those that have C's, and use intersect to find the dates that have both.

keep_dates <- with(x, intersect(date[cp_flag=='P'], date[cp_flag=='C']) )
# "1996-04-02" "1996-04-09" "1996-04-16"

Keep only the rows that have dates appearing in keep_dates.

x[ x$date %in% keep_dates, ]
#   myIdx strike_price       date     exdate cp_flag strike_price.1
# 8355342       605000 1996-04-02 1996-05-18       P         605000
# 8355433       605000 1996-04-02 1996-05-18       C         605000
# 8356541       605000 1996-04-09 1996-05-18       P         605000
# 8356629       605000 1996-04-09 1996-05-18       C         605000
# 8358033       605000 1996-04-16 1996-05-18       P         605000
# 8358119       605000 1996-04-16 1996-05-18       C         605000

wch 2010-08-06 04:37:13

Elegant! I like this one a lot.

Vince 2010-08-06 05:48:24

Answer 4

A:

Here's a reshape approach.

library(reshape)
#Add a dummy value
df$value <- 1
check <- cast(df, myIdx + strike_price + date + exdate + strike_price + return ~ cp_flag)

#take stock of what just happened
summary(check)

#use only complete cases. If you have NAs elsewhere, this will knock out those obs too
check <- check[complete.cases(check),]

#back to original form
df.clean <- melt(check, id = 1:6)

JoFrhwld 2010-08-06 04:59:33

ansaurus

tags:

views:

answers:

R checking pairs of rows in a dataframe

related questions