tags:

views:

165

answers:

2

I have two small data frames, this_tx and last_tx. They are, in every way that I can tell, completely identical. this_tx == last_tx results in a frame of identical dimensions, all TRUE. this_tx %in% last_tx, two TRUEs. Inspected visually, clearly identical. But when I call

identical(this_tx, last_tx)

I get a FALSE. Hilariously, even

identical(str(this_tx), str(last_tx))

will return a TRUE. If I set this_tx <- last_tx, I'll get a TRUE.

What is going on? I don't have the deepest understanding of R's internal mechanics, but I can't find a single difference between the two data frames. If it's relevant, the two variables in the frames are both factors - same levels, same numeric coding for the levels, both just subsets of the same original data frame. Converting them to character vectors doesn't help.

Background (because I wouldn't mind help on this, either): I have records of drug treatments given to patients. Each treatment record essentially specifies a person and a date. A second table has a record for each drug and dose given during a particular treatment (usually, a few drugs are given each treatment). I'm trying to identify contiguous periods during which the person was taking the same combinations of drugs at the same doses.

The best plan I've come up with is to check the treatments chronologically. If the combination of drugs and doses for treatment[i] is identical to the combination at treatment[i-1], then treatment[i] is a part of the same phase as treatment[i-1]. Of course, if I can't compare drug/dose combinations, that's right out.

+5  A: 

Well, the jaded cry of "moar specifics plz!" may win in this case:

Check the output of dput() and post if possible. str() just summarizes the contents of an object whilst dput() dumps out all the gory details in a form that may be copied and pasted into another R interpreter to regenerate the object.

Sharpie
+5  A: 

Generally, in this situation it's useful to try all.equal which will give you some information about why two objects are not equivalent.

hadley
I did try this and got a rather cryptic `"Attributes: < Component 2: Mean relative difference: 0.01158301 >"`, which is what led me to str() the data frames - but "Attributes: Component 2" didn't really lead me to the row numbers.
Matt Parker
Well, different attributes would suggest looking at `attributes`...
hadley