views:

431

answers:

2

Standard EAV schema : One column for Entity ID, one for Attribute ID, one for Value ID.

Historical EAV schema : Add an additional column(s) for times/date-ranges

At run time, certain rows will be excluded. There may be 0, 1, or many rows returned per entity, per attribute. We only want the most recent value for each attribute remaining.

Our current solution is using the SQL Server Rank() function to mark each row with a rank, and then in the where clause we have "and rank = 1".

However, performance is not satisfactory. During analysis we find that assigning the ranks is quite fast, however doing the where clause against the rank requires a second scan of the data, and keeps the entire data set in RAM.

What is the fastest way to rank the remaining attribute rows, and return only the latest?

+1  A: 

The general idea would be to extract the latest + key first, then join back to get value which is not part of the aggregate. The fact it's EAV does not matter.

SELECT
    *
FROM
    table t
    JOIN
    (SELECT MAX(dt) AS mdt, eID, aID FROM table GROUP BY eID, aID) mt
                     ON t.eID = mt.eID AND t.aID = mt.aID AND t.dt = mt.mdt
WHERE
    ...
gbn
+1  A: 

While I think gbn's answer is probably sufficient, I'm wondering whether use of an OVER clause to establish a MAX date per id/attribute with which to reduce the SELECT in a WHERE clause wouldn't be faster than a RANK? No time to test performance, but here's the query:

select * 
from (
  select *, max(dt) over (partition by eID, aID) maxdt 
    from table
) t
where t.dt = t.maxdt and ...

Good luck!

ewbi