Consider the following (admittedly long) example.
The sample code creates two data sets, data one with "key" variables i,j,k and data two with key variables j,k and a "value" variable x. I'd like to merge these two data sets as efficiently as possible. Both of the data sets are indexed with respect to j and k: the index for the first data should not be needed but it's there anyway.
Proc SQL does not use the index in the data two, which I suppose would be the case if the data were in a relational database. Is this just a limitation of the query optimizer I have to accept?
EDIT: The answer to this question is yes, SAS can use an index to optimize a PROC SQL join. In the following example, the relative sizes of the data sets matters: If you modify the code so that data two becomes relatively larger than data one, index will be used. Whether the data sets are sorted or not, does not matter.
* Just to control the size of the data;
%let j_max=10000;
* Create data sets;
data one;
do i=1 to 3;
do j=1 to &j_max;
do k=1 to 4;
if ranuni(0)<0.9 then output;
end;
end;
end;
run;
data two;
do j=1 to &j_max;
do k=1 to 4;
x=ranuni(0);
if ranuni(0)<0.9 then output;
end;
end;
run;
* Create indices;
proc datasets library=work nolist;
modify one;
index create idx_j_k=(j k);
modify two;
index create idx_j_k=(j k) / unique;
run;quit;
* Test the use of an index for the other data set:
* Log should display "INFO: Index idx_j_k selected for WHERE clause optimization.";
options msglevel=i;
data _null_;
set two(where=(j<100));
run;
* Merge the data sets with proc sql - no index is used;
proc sql;
create table onetwo as
select
one.*,
two.x
from one, two
where
one.j=two.j and
one.k=two.k;
quit;