tags:

views:

53

answers:

1

Say I've already sorted set1 and set2 by the variables 'sticks', 'stones', and 'bones' and then i do this:

data merged;
    merge set1(in=a) set2(in=b);
    by sticks stones bones;
    if a and b then output;
    *else we don't want to do anything;
run;

Is there an easy way to drop all the variables from set2 in the merged dataset without having to type them all? I keep running into this problem where I have two datasets - both with quite a few variables - and I only want to merge them by a few variables and then only keep the variables from one of the sets.

I usually just use proc sql for something like this, but there are a few situations (more complex than above) where where I think merge is better.

Also, I find it annoying that SAS requires you to "manually" sort datasets before merging them. If it will not let you merge datasets unless they are sorted correctly, why doesn't it just do it for you when you use merge? Thoughts? Maybe there is a way around this I don't know about.

+6  A: 

Greetings! The sorted requirements is there for the way the merge statement and the PDV works in it. There is really no way around it.

However here basically you're doing a lookup of set2 to make sure you have a match of the key variables (sticks stones bones) through the equivalent of an inner join, which you could likely do more efficienty through an hash table or set with keys (if you have an index of course).

The easiest and more convenient way for what you want here is having a keep statement in the set2 so you load into the PDV only the by variables. Something like this:

data merged;
merge set1(in=a) set2(in=b keep=sticks stones bones);
by sticks stones bones;
if a and b then output;
*else we don't want to do anything;

run;

In case hash tables don't scare you and want to learn more on how to implement them in this case feel free to contact me for more help.

EDIT:

Here is a good paper about using hash tables http://www.nesug.org/proceedings/nesug06/dm/da07.pdf Bear in mind that using hashes you should know what you're doing and they may yeld unexpected results if you dont know whats happening under the hood. Regardless here is the problem solved using a very simble and basic hash table

data merged2;
set set1;
if _N_ = 1 then do;
  declare hash h(dataset:"set2");
  h.defineKey('sticks','stones','bones');
  h.defineData('sticks','stones','bones');
  h.defineDone();
end;
rc = h.find();
if rc=0;
drop rc;

run;

This code has the main benefit of not requiring the sorting of the datasets which in case set2 is particularly big is a great time-saver

Regards

Fabio Prevedelli

Fabio Prevedelli
man that is so simple and easy. i feel pretty dumb! :) thanks! I wouldn't mind learning more about hash tables in SAS. if you have a good link, feel free to send it my way.
oob