tags:

views:

526

answers:

7

I'm stumped. I have a situation where for each unique observation of casenum I would like to run varies queries and arithmatic operations between various observations of 'code' for that 'casenum' (see below). For example for casenum 1234567 I would like subtract data for code 0200 - code 0234 or 531 - 53. Please keep in mind that there are thousands of observations in this dataset.Is there an easy way to do this or to do row comparisons with the particular .

/please note casenum and code are character variables and data is a numeric variable/

Here is an example of how the dataset is structured:

casenum code data

1234567 0123 4597
1234567 0234 53
1234567 0100 789
1234567 0200 531
1234567 0300 354
1111112 0123 79
1111112 0234 78
1111112 0100 77
1111112 0200 7954
1111112 0300 35

+1  A: 

I'm unclear on what your logic is for the subtraction part of the code, but for the selection of a group of rows I can suggest. At first glance I would obtain a list of distinct values for casenum.

proc sql;
select distinct casenum 
into :casenum_list separated by ' '
from dataset;
quit;

Now that you have a list of all distinct casenum values, I would iterate through the rows following whatever logic you need.

Possibly using another proc sql like:

%MACRO DOIT;
%LET COUNT=1;
%DO %UNTIL (%SCAN(&casenum_list,&COUNT) EQ);

%LET CASENUM_VAR=%SCAN(&casenum_list,&COUNT);

PROC SQL;
SELECT
<INSERT SOME SQL LOGIC HERE>
FROM 
DATASET
WHERE CASENUM=&CASENUM_VAR;
QUIT;

%LET COUNT=%EVAL(&COUNT+1);        

%MEND DOIT;

%DOIT;

I hope this helps. If you can provide more insight into what you are trying to accomplish within the rows, I can be more specific.

AFHood
A: 

I don't think I really have enough info from your question to help, but I will just throw this out.... If you want to do row comparison, you can also use the data step. Assuming you have your data sorted by casenum you can use first. and last. to determine when you have a new casenum and when you are on the last row of a casenum. If you want to sum up data values between rows or make decisions based on a previous row for a casenum listed multiple times.

Data work.temp ;
 retain casenum_data ;

 set lib.data ;
 by casenum ;

 if first.casenum then do ;
   /* <reset hold vars> */
   casenum_data = 0 ;
 end ;

 if code = "0200" or code = "234" then .....


 if last.casenum then do ;
   /* output casenum summary */
   output ;
 end ;

run ;

Post more info about need and more help can be given.

CarolinaJay65
A: 

Hey it's me again!

Thanks for your answers I am going to try these out and also i'll try to give a better explained example and description of what i'm trying to do if these don't work.

A: 

Here is the logic although likely syntactically incorrect of what I am trying to do.

casenum code data

1234567 0123 4597
1234567 0234 53
1234567 0100 789
1234567 0200 531
1234567 0300 354
1111112 0123 79
1111112 0234 78
1111112 0100 77
1111112 0200 7954
1111112 0300 35

For code observations where casenum is the same, within those casenums
I would like it to determine, if data for code 0234 + data for code 0100 - data for code 0123 ne data for code 0200 then newvariable = 'YES'

in other words I'd like it to test if 53 + 789 - 4597 ne 531. after that and other similar kinds of tests run within casenum 1234567, I'd like it to move onto the next casenum, and run those same tests for that casenum.

Keep in mind this dataset has hundreds of thousands of observations in it.

A: 

Hmmm as I am playing around with the code similar to what you said in your second reply, it is making more and more sense. I'm thinking as it goes through one loop i can basically create hold variables for the data and at once i get to the if last.id i can use all those values placed in the hold variables in my queries and comparisons.

I'll try it out and let you know the result.

Just so you know - you can edit your original question or add comments to the answers of others - rather than posting a new answer
cmjohns
+1  A: 

If the formula is fixed (as your example seems to suggest), then there shouldn't be any reason that you can't do a straightforward transpose and then declare the test explicitly.

/* Transpose the data by casenum */    
proc transpose data=so846572 out=transpose_ds;
    id  code;
    var data;
    by casenum;
run;

/* Now just explicitly write your conditional expression */  
data StackOverflow;
    set transpose_ds;

    if _0234 + _0100 - _0123 <> _0200 then newvariable="yes";
run;

Where so846572 = Your original dataset, transpose_ds = Transposed version, StackOverflow = final output.

Let us know if this expression needs to be dynamic for some reason. This should easily scale to the volume of data you've mentioned without any problems. You could conceivably do the same kind of thing with a hash as well in one pass of the data.

Jay Stevens
A: 

Hey its me again. Thank you so much the data step code you posted worked. I really appreciate it. I was surprised how simple it was too, and its very easy to understand. By looping things around like that, it basically creates a variable with the item code data. that I use to make queries. That is exactly what I needed.

The first code using the proc sql was a little bit more complex to me, but if it would do the same thing then I should take some time to analyze it since I like using proc sql for merging.