views:

38

answers:

1

Im working on something incredibly unique..... a property listings website. ;)

It displays a list of properties. For each property a teaser image and some caption data is displayed. If the teaser image and caption takes a site visitors interest, they can click on it and get a full property profile. All very standard.

The customer wants to be able to allow property owners to add multiple teaser images and to be able to track which teaser images got the most click throughs. No worries there.

But they also want to allow the property owner to weight each teaser image to control when it is shown. So for 3 images with weightings of 2, 6, 2, the 2nd image would be shown 6/10 times. This needs to be balanced. If the first 6 times the 2nd image is shown, it cant be shown again until the 1st and 3rd images have be shown twice each.

So I need to both increment how often an image has been retrieved and also retrieve images in a balanced way. Forget about actual image handling, Im actually just talking about Urls.

Note incrementing how often it has been retrieved is a different animal to incrementing how often it has captured a click through.

So i can think of a few different ways to approach the problem using database triggers or maybe some LINQ2SQL, etc but it strikes me that someone out there will know of a solution that could be orders fo magnitude faster than what i might come up with.

My first rough idea is to have a schema like so:

TeaseImage(PropId, ImageId, ImageUrl, Weighting, RetrievedCount, PropTotalRetrievedCount)

and then

select ImageRanks.*
from (Select t.ImageID, 
             t.ImageUrl, 
             rank() over (partition by t.RetrievedCount order by sum(t.RetrievedCount) desc) as IMG_Rank 
       from TeaseImage t
     where t.RetrievedCount<t.Weighting
    group by t.PropID) ImageRanks
where ImageRanks.IMG_Rank <= 1

And then

 1. for each ImageId in the result set increment RetrievedCount by 1 and then
 2. for each PropId in ResultSet increment PropTotalRetrievedCount by 1 and then 
 3. for each PropId in ResultSet check if PropTotalRetrievedCount ==10 and if so reset it to PropTotalRetrievedCount = 0 and RetrievedCount=0 for each associated ImageId

Which frankly sounds awful :(

So any ideas? Note: if I have to step out of the datalayer I'd be using C# / .Net. Thanks.

+1  A: 

If you want to do this entirely in your database, you could split your table in two:

Image(ImageId, ImageUrl)
TeaseImage(TeaseImageId, PropId, ImageId, DateLastAccessed)

The TeaseImage table manages weightings by storing additional (redundant) copies of each property-image pair. So an image with a weight of six would get six records.

Then the following query gives you the least-recently used record.

select top 1 ti.TeaseImageId, i.ImageUrl
from         TeaseImage ti
join         Image i
on           i.ImageId = ti.ImageId
where        ti.PropId = @PropId
order by     ti.DateLastAccessed

Following the select, just update the record's DateLastAccessed. (Or even update it as part of the select procedure, depending on how fault-tolerant you need to be.)

Using this technique would give you fine-grained control over the order of image delivery, (by seeding their DateLastAccessed values appropriately) and you could easily modify the ratios if need be.

Of course, as the table grows, the additional records would degrade query performance earlier than other approaches, but depending on the cost of the query relative to everything else that's going on that may not be significant.

Jeff Sternal
Thats a very interesting idea. I wonder how performant it would be though. If i have 100 property listings divided into 5 pages = 20 properties per page. Id have to do a join on the query that got the 20 for the page display to your query shown above. So you query would have to be run *page length* number of times per page request.Any idea how performant that might be? I guess its all in the testing?
rism
The more i think about it though - the more i like it. Replacing counters and incrementers with timestamps and some redundancy seems like highly managable solution especially given the paltry size of the current and expected future dataset.
rism
Exactly as you say - it's all in the testing! It will depend on the hardware, the number of records in the table, indexes, etc.. It's definitely a lot of (relatively) expensive sorting, semi-extraneous records, and so forth. But in the end, it may well just be insignificant.
Jeff Sternal