tags:

views:

68

answers:

1

Hello all, have a rather vexing SAS problem and I would like to ask for your help. Here's the problem:

I have two SAS data sets; let's call them setA and setB. Each row in setA has multiple attributes and one attribute is a key value that is unique within the data set. setB consists of two attributes. These attributes are key values from setA and indicate that the row in setA with attribute 1 key value is a duplicate of the row with attribute 2 key value (duplicate excluding the key value).

I need to remove all duplicate rows in setA.

I am rather new to SAS and I believe the version I am using is 9.1. What would be the best way of solving this problem? thank you.

+3  A: 

My interpretation of your question is that if setA contains

key   value
1        67
2         3
3         4
8        16
9        16
10        4

and setB contains

key1   key2
 8        9
10        3

then you want the new setA to look like this (because key=9 is a dupe of key=8 and key=10 is a dupe of key=3):

key   value
1        67
2         3
3         4
8        16

If I have interpreted your question correctly, you can do it with this SAS code:

data dupes_to_remove (keep=larger_key rename=(larger_key=key));
  set setB;
  if key1 > key2 then larger_key = key1;
  else larger_key = key2;
  output;
run;

proc sort data=dupes_to_remove nodupkey;
  by key;
run;

data setA_new;
  merge setA dupes_to_remove (in=in_dupes);
  by key;
  if not in_dupes;
run;

(Also note that the usual term in SAS is "variable" rather than "attribute".)

Simon Nickerson
Sir, Thank you. Exactly what I was looking for.
ChamaraG