I'm struggling to find an efficient and flexible representation for my data. We have a many-to-many relationship between two entities which have arbitrary lifetimes. Let's call these Voter
and Candidate
. Each relationship has a measurement which we'd like to summarize in various ways. These are timestamped and are guaranteed to be within the lifetime of the two related entities. Let's say the measure is approval rating, or just Rating
.
One unusual requirement is that if I'm summarizing a period which has no measurement, I should substitute the latest valid measurement, rather than giving NULL.
Our current solution is to compile a list of valid voters and candidates for each day, then formulate a many-to-many table which records the latest valid measure.
What would your solution be?
This allows me to do a single query to get a daily summary:
select
avg(rating), valid_date, candidate_SSN, candidate_DOB
from
daily_rating natural join rating
group by
valid_date, candidate_SSN, candidate_DOB
This might work ok, but It seems inefficient to me. We're repeating a lot of data, especially if nothing happens for a given day. It also is unclear how to do weekly/monthly summaries without compiling even more tables. Since we're dealing with millions of rows (we're not really talking about voter polls...) I'm looking for a more efficient solution.