Hi,
Let's say I have a MySQL database with 3 tables:
table 1: Persons, with 1 column ID (int)
table 2: Newsletters, with 1 column ID (int)
table 3: Subscriptions, with columns Person_ID (int), Newsletter_ID (int), Subscribed (bool), Updated (Datetime)
Subscriptions.Person_ID points to a Person, and Subscription.Newsletter_ID points to a Newsletter. Thus, each person may have 0 or more subscriptions to 0 or more magazines at once. The table Subscriptions will also store the entire history of each person's subscriptions to each newsletter. If a particular Person_ID-Newsletter_ID pair doesn't have a row in the Subscriptions table, then it's equivalent to that pair having a subscription status of 'false'.
Here is a sample dataset
Persons
ID
1
2
3
Newsletters
ID
1
2
3
Subscriptions
Person_ID Newsletter_ID Subscribed Updated
2 1 true 2010-05-01
3 1 true 2010-05-01
3 2 true 2010-05-10
3 1 false 2010-05-15
Thus, as of 2010-05-16, Person 1 has no subscription, Person 2 has a subscription to Newsletter 1, and Person 3 has a subscription to Newsletter 2. Person 3 had a subscription to Newsletter 1 for a while, but not anymore.
I'm trying to do 2 kinds of query.
A query that shows everyone's active subscriptions as of query time (we can assume that updated will never be in the future -- thus, this means returning the record with the latest 'updated' value for each Person_ID-Newsletter_ID pair, as long as Subscribed is true (if the latest record for a Person_ID-Newsletter_ID pair has a Subscribed status of false, then I don't want that record returned)).
A query that returns all active subscriptions for a specific newsletter - same qualification as in 1. regarding records with 'false' in the Subscribed column.
I don't use SQL/databases often enough to tell if this design is good, or if the SQL queries needed would be slow on a database with, say, 1M records in the Subscriptions table.
I was using the Visual query builder tool in Visual Studio 2010 but I can't even get the query to return the latest updated record for each Person_ID-Newsletter_ID pair.
Is it possible to come up with SQL queries that don't involve using subqueries (presumably because they would become too slow with a larger data set)? If not, would it be a better design to have a separate Subscriptions_History table, and every time a subscription status for a Person_ID-Newsletter-ID pair is added to Subscriptions, any existing record for that pair is moved to Subscriptions_History (that way the Subscriptions table only ever contains the latest status update for any Person_ID-Newsletter_ID pair)?
I'm using .net on Windows, so would it be easier (or the same, or harder) to do this kind of queries using Linq? Entity Framework?
Edit: Here's what happens if I use this query:
SELECT Person_ID, Newsletter_ID, Allocation, Updated, MAX(Updated) AS Expr1
FROM subscriptions
GROUP BY Person_ID, Newsletter_ID
I get rows 2 and 4 from the Subscriptions table mishmashed together (in row 2 of the results set below):
Person_ID Newsletter_ID Subscribed Updated Expr1
2 1 true 2010-05-01 2010-05-01
3 1 true 2010-05-01 2010-05-15
3 2 true 2010-05-10 2010-05-10
Thanks!