I am trying to build a recommender system based on user clicks. so far, i am not very concerned with all the nodes. i just wanted a way to develop a mechanism to detect that when a user clicked on A, he clicked on B. and when the user clicks on B, he clicks on C. so basically, in this node:
a > b > c
Its not very important for me to be able to tell that clickers on A will most likely click on c. as long as I know that clicker on A clicked on B, and clicker on B clicked on C, and so on, I am jolly.
On the preface, we collect logs in the form of IIS which has viewstateuserkey (guid) as unique identifier. So basically, if 2 viewstateusers click on a and b, the result should return 2.
Here is what my table looks like currently.
CREATE TABLE [dbo].[ratingdeveloper]( [itemid1] [int] NULL, [itemid2] [int] NULL, [count] [int] NULL, [sum] [int] NULL, [userid] varchar NULL ) ON [PRIMARY]
GO
and a sample query looks like this:
SET NOCOUNT ON
SELECT a.itemid1
, b.itemid2
, sqrt(count(a.userid)) as occurance
from ratingdeveloper as a WITH (NOLOCK)
inner
join ratingdeveloper as b WITH (NOLOCK)
on a.itemid1 = b.itemid1
and a.itemid2 = b.itemid2
group
by a.itemid1
, b.itemid2
HAVING a.itemid1 = 1
order
by a.itemid1
, b.itemid2
GO
My Questions are:
1) Does this look good enough? Any better ideas? 2) Since IIS logs have viewstateuser keys, and click times, how do I loop through the log file, and import every instance of clicks on respective places. For instance, if viewstateuser(1) clicks on A, B, C
I want the data inserted in the form of:
ITEMID1 = A, ItemID2 = B ITEMID1 = B, ITEMID2 = c
and so on.
Any suggestion would be appreciated.
Thanks,
-Subhash