Lets say I have a database table called "Scrape" possibly setup like:
UserID (int)
UserName (varchar)
Wins (int)
Losses (int)
ScrapeDate (datetime)
I'm trying to be able to rank my users based on their Wins/Loss ratio. However, each week I'll be scraping for new data on the users and making another entry in the Scrape table.
How can I query a list of users sorted by wins/losses, but only taking into consideration the most recent entry (ScrapeDate)?
Also, do you think it matters that people will be hitting the site and the scrape may possibly be in the middle of completing?
For example I could have:
1 - Bob - Wins: 320 - Losses: 110 - ScrapeDate: 7/8/09
1 - Bob - Wins: 360 - Losses: 122 - ScrapeDate: 7/17/09
2 - Frank - Wins: 115 - Losses: 20 - ScrapeDate: 7/8/09
Where, this represents a scrape that has only updated Bob so far, and is in the process of updating Frank but has yet to be inserted. How would you handle this situation as well?
So, my question is:
- How would you handle querying only the most recent scrape of each user to determine the rankings
- Do you think the fact that the database may be in a state of updating (especially if a scrape could take up to 1 day to complete), and not all users have completely updated yet matters? If so, how would you handle this?
Thank you, and thank you for your responses you have given me on my related question: