Hi,
I don't know whether the term sort is most appropriate but anyway I have a large datasets which consists of columns userA, userB, date, interactionDuration. In other words the dataset contains rows which describe which users were interacting, how long the interaction lasted and the date of interaction.
The goal is to divide these interaction into relevant and irrelevant or important and unimportant and by the following logic: if the interaction is long we assume it is important and if it is short it is accidental and "irrelevant". On the other hand if same two persons interact often (every day or every week) and event if the single interaction doesn't last long we assume that it is important because it is not accidental.
The data is stored in MS SQL Server 2008 and I am wondering whether it is possible to somehow calculate the interval (in days) between the interactions of distinct pairs of people by using SQL queries or it is necessary to write custom routine.
Anyway I would be very thankful to hear any suggestion about how to perform sorting that divides the pairs into groups by the number of interactions, interaction durations and the intervals between the interactions - is it possible to calculate timespans between two consecutive records (sorted by date) and then select the group of pairs which interacted commonly and/or for a long time by using solely SQL queries or is it neccesary to write custom algorithm?
edit1
These are example rows:
user_A user_B interaction_duration interaction_date_date
00002781A2ADA816CDB0D138146BD63323CCDAB2 0E3F0A58C6BD5202D1B0D4D4F6A0B05E7A156AC7 532 2010-09-04
00002781A2ADA816CDB0D138146BD63323CCDAB2 47DFCA60C7D2D9B32ACBE650BCBA53F3DA57CB7B 29 2010-09-15
00002781A2ADA816CDB0D138146BD63323CCDAB2 8F4ADF3CC26905AF69F930C546954D87DB536E24 83 2010-09-04
00002781A2ADA816CDB0D138146BD63323CCDAB2 8F4ADF3CC26905AF69F930C546954D87DB536E24 10 2010-09-01
00002D2354C7080C0868CB0E18C46157CA9F0FD4 372BD190F61BAACDFA2512B71B0C62BCD654D5D1 50 2010-09-23
00002D2354C7080C0868CB0E18C46157CA9F0FD4 372BD190F61BAACDFA2512B71B0C62BCD654D5D1 105 2010-09-18
00002D2354C7080C0868CB0E18C46157CA9F0FD4 372BD190F61BAACDFA2512B71B0C62BCD654D5D1 44 2010-09-08
00002D2354C7080C0868CB0E18C46157CA9F0FD4 372BD190F61BAACDFA2512B71B0C62BCD654D5D1 50 2010-09-04
00002D2354C7080C0868CB0E18C46157CA9F0FD4 372BD190F61BAACDFA2512B71B0C62BCD654D5D1 21 2010-09-02
What is the best way to sort the above records by frequency (rarity) of interactions? Is it possible to sort it by using sql queries?
Thank you!