views:

103

answers:

2

I've got a list of transaction dates and the user id of the person who made the transaction on that date (just 1 Tx/day allowed). For example:

I'd like to create a matrix which shows, as of each date, the number of users who have made 1 transaction, 2-10 transactions, 10-20 transactions, etc. For example (note, the below data doesn't correspond to the transaction data above):

Is a pivot table my best mechanism here? If so (or not) how would I approach this?

+1  A: 

My vote use a pivot If you have 2007 something like this

1) Select the data you have above 2) Do Insert Pivot 3) Drag Date to Row Loabel 4) Drag User ID to Columns => you get one column per user ID 5) In Values yoiu should have Count of Users 6) Then you need to add new columns that calculates the number of users that are in segment 1-10 etc

salgo60
Thanks for the quick answer. Can you please clarify why I'd drag user ID to columns? I'd end up with thousands of columns. The other gap I'm having is how to create the segment column formulas to count the number of occurrences in that range.
Howiecamp
Maybe it is a bad approach if you have thousands of users....I thought about using a frequency function like http://www.meadinkent.co.uk/xlfreq.htm
salgo60
+1  A: 

I know what I am going to say is a bit "out of scope", but I had a problem like this and I used R to work around it instead. (If I hadn't use R, I think I would have tried sql but in no way I would choose excel)

I also have a 2-columns table named "trans_data", like yours. The column names are "trans_date" and "user_id". I also wanted a contingency table like yours with counts of users within specific transaction limits.

Here is the code

library(plyr)
adply(table(trans_date),1,function(x) {
     d = NULL
     d["1"] <- sum(x==1)
     d["2_to_5"] <- sum(x > 1 & x <= 5) 
     d["6_to_27"] <- sum(x > 5 & x <= 27)
     d["28_to_120"] <- sum(x > 27 & x <= 120)
     d["121_to_398"] <- sum(x > 120 & x <= 398)
     d[">_398"] <- sum(x > 398)
     return(d)
   }
)

and part of the result

  trans_date   1 2_to_5 6_to_27 28_to_120 121_to_398 >_398
1 2009-01-25 257    169      61         7          1     0
2 2009-01-26 145    125      53         3          1     0
3 2009-01-27 175    117      44        12          0     0
4 2009-01-28 171    138      49         7          4     0
5 2009-01-29 756    217      71         5          3     0
gd047