views:

204

answers:

6

In the home page of my app, I try to implement something like "most visited websites" that Chrome & Safari have in their home pages. The major diff being that instead of websites the app will show Objects from the DB (Persons,Organizations,Orders, Invoices etc).

It should work like automatic bookmarking.

There is already a "Recently Viewed" table in the DB (InstanceId, UserId, LastVisitDate).

I could expand it with (NumberOfVisits) attribute, but that would not permit me to weight more recent visits over old.

One idea I'm working on is to weight the NumberOfVisits based on the the LastVisitDate. When the user visits the instance again, I weight down the existing NumberOfVisits before adding the new Visit.

Has anyone implemented such a system before? Do you know how Safari & Chrome do it in the home page?

+1  A: 

What about plain old stupid "Number of visits in the last month"?

I would also consider the duration the user stays on one particular entity. For example, I'd weight more an entity the user visited 3 times during the last month where he worked on it for half an hour each time, than an entity he visited 10 times but only held open for maybe 15-20 seconds.

Developer Art
that would be easy to do if I had a table containing all visits for all users to all object.This is something I would really like to avoid. I keep *last* visit for all users to all object.That means I do not know the "Number of visits in the last month"
pkario
+1  A: 

Query

SELECT
    InstanceId,
    UserId,
    COUNT(InstanceId) AS [Count]
GROUP BY InstanceId, UserId
HAVING Date >= DATEADD(DAY, -7, GETDATE()) -- This will give the most visited instances in the last week. In the same way you can get the most visited instances in a day, month, year, etc.
ORDER BY [Count] DESC
Kirtan
A much more elegant and useful implementation of my suggestion. I would simply add that by doing a subquery for the oldest date (where you have the `-7` ) and finding the datediff of that and today, you could then get the most visited instances ever.
Anthony
+1  A: 

"Most visisted" is often just relevant for the last couple of weeks / days / years. So there's one "requirement" you need to take in mind. So imagine that you want to have the last 30 days most visited sites, well one way to do this is to have a table structure like this

Id

PageId

LastFlush

Visists

where the "LastFlush"-colum is a DateTime telling you when you last Flushed. However, if you do use this structure you have another problem, after "30 days" you are back at 0 and there's no MostVisistedSites to display.

So yet another aspect is to store each visist in a single row such as this

Id

PageId

Visisted

This would however fill up with more rows in your table but it would give you a better overview of your websites, you could of course remove all the past 30 days visists and just focus on the newly visisted if you want.

Filip Ekberg
+2  A: 

I think this is a similar dilemma some of my team was having when we were trying to figure out the best way to weight different groups based on the number of open cases those groups had open. The problem was that we wanted to be able to see which groups had the oldest cases (meaning they'd left the case open far too long) AND which groups had the MOST cases. But it wasn't useful to look at either variable separately, because some groups had one pretty old case, but lots of new open cases (not such a big deal) and some groups had tons of old cases and tons of new cases (because they get more traffic in general) while some groups (the ones we were after) had mostly old cases and a lot of them.

The reason it reminds me of your situation is that we were banging our heads against the wall trying to come up with complicated "oldest date acceptable" type systems to draw some kind of line around the culprits. What we finally realized was that we just needed to use the length of time between the current date (today) and the oldest date of any open case (whatever it might be) and then with that range, we could then find the average age of each group's cases (regardless of the number of cases) and make a pivot table. Sure enough, we found one group that had only one case, but it was over a year old, and then the next in line was our real culprits, the guys with almost a hundred cases with a average of over 9 months.

So perhaps you need to step back and forget about weighing your user's visits and simply track each of their visits. For each visit, give it a timestamp and some kind of identifier. The identifier could be the invoice number, userID, query keywords, or whatever else makes it group-able with similar or identical visits. Now you can do a query for their favorite data, their favorite data of the day, their favorite data of the month, etc etc.


I just noticed your comment saying that you wanted to avoid a table of all visits to all objects by all visitors. This blows my suggestion out of the water, and perhaps the other one that I have endorsed.

I think you will find it more trouble than it's worth to come up with a useful algorithim for tracking user's visits/behavior without tracking all of the data. For instance, let's say I go to the same page every day for 5 minutes. It may not raise any flags on your code, so it never get's caught as an "auto bookmark" but I really do want to see it every day. Just only for a moment. Unless you are keeping that data in a table so that it can be noticed, I don't think you will be able to really offer a system that "learns" the user's visiting habits.

Now, having said that, are you wanting to avoid such a table because you think it will become bloated and less useful over time? Perhaps after a month, we can give up on the idea that tracking my one visit to this page was relevant, right?

So maybe you can track all of the data over some amount of time, let's say two weeks (maybe a month). After each month, you run a clean up. Really it would be after the first two months, and then each month, because you'd always keep a month's worth of data.

So, when I visit my homepage, your code figures out the popular sites based on the last month's worth of data and shows those as top hits. Easy enough. But it is trying to learn about the user, so it also has decided "this page is a definite keeper", and it knows to add that one no matter what. So any data older than 30 days is thrown out, but the "keepers" are kept in a "known favorites for this user" table.

What would be really slick would be for it to have three other features/options:

1) It can learn when a "known favorite" has gone stale. It tracks how often the user has visited that site in the past 30 days, and if it is not every 3 days (10 hits) average, it gets tossed out (or moved to the bottom).

2) The user can be prompted prior to any favorites being thrown out "You don't seem to go here anymore, should I stop showing it on the front page?" and probably they will say "toss it" but maybe they will realize they lost track of that project and say "no, keep it."

3) The user could have the option of simply adding or removing these bookmarks to a permanent state on their own. Which isn't to undermine your whole idea, only to make it so that the user's have some amount of control. So when an "Auto Bookmark" shows up, they see a "+" or some sort next to the object that always them to make it permanent, because they know they only check it once a month, but want the reminder of seeing it daily. Or they know they will never want to see it again, they just were watching it a lot last week, so they can hit the "-" next to it and your code will blacklist it from the next two weeks worth of data (or just clear out those hits from the data). This will ensure that the object WILL come back up if they start visiting it a lot again in the future, just not based on the previous data.

Anthony
pkario
Sorry, I try to make up in personality what I lack in precision. As far as the 1k vs 1 month idea, I do like how it leaves room for the user that's an infrequent visitor, but will it work for the user's that are heavy-hitters? For instance, if I manage 200 people and visit their pages 3 times each a day (for maybe 30 seconds or less, just to keep tabs), but always visit this OTHER page once a day, but spend an hour on it, won't that one get bumped down or off after two days? Maybe object types need to be considered?
Anthony
A: 

Here is the solution as implemented: Added two columns in my LastView table and now it is:

  • ObjectType
  • ObjectID
  • UserID
  • DateTimeLastView
  • NumberOfVisits (new)
  • NumberOfVisitsNormalized (new-Calculated)

Description:

  • NumberOfVisitsNormalized is NumberOfVisits multiplied by a factor that gets smaller for older entries. This way new visits are more valuable than old ones.
  • When increasing number of Visits I value an INSERT as 10 SELECTS, an UPDATE as 5 SELECTS.
  • NumberOfVisits=1+NumberOfVisitsNormalized.
  • If a User visits the same page within 10 minutes of last visit, it gets no bonus.
  • If a User visits the same page within 24 hours of last visit, it gets 50% bonus.

That way we have:

  • Pages last visited
  • Users last seen
  • Favorite pages

with just 6 (5 real+1 calculated) columns, no triggers,UDFs, SPs and fast queries.

pkario
A: 

You could use a leaky bucket abstraction.

Store the number of visits, and the date that this was last updated. When you update it, set number of visits to
(old number - (days since last update * aging factor) + 1)

When you query it, always normalize with
(number of visits - (days since last update * factor))
And then you can just order by the result of that.

The larger the aging factor, the faster the number would decline.

You could also use the square (or cube) of "days since last update" to decay faster the older the last visit gets.

Joeri Sebrechts