Problem statement: A table contains an item_id, a category_id and a date range (begin_date and end_date).
No item may be in more than one category on any given date (in general; during daily rebuilding it can be in an invalid state temporarily).
By default, all items are added (and re-added if removed) to a category (derived from outside data) automatically on a daily basis, and their membership in that category matches the lifespan of the item (items have their own begin and end date, and usually spend their entire lives in the same category, which is why this matches).
For items in category X, it is occasionally desirable to override the default category by adding them to category Y. Membership in category Y could entirely replace membership in category X (that is, the begin and end dates for membership in category Y would match the begin and end dates of the item itself), or it could override it for an arbitrary period of time (at the beginning, middle or end the item's lifespan, possibly overriding for short periods at multiple times). Membership in category Y is not renewed automatically and additions to that category is done by manual data entry.
Every day, when category X is rebuilt, we get an overlap, where any item in category Y will now be in category X as well (which is forbidden, as noted previously).
Goal: After each repopulation of category X (which is done in a rather complicated and fragile manner, and ideally would be left alone), I'm trying to find an efficient means of writing a stored procedure that:
Identifies the overlaps
Changes existing entries, adds new ones where necessary (such as in the case where an item starts in category X, switches to category Y, then eventually switches back to category X before ending), or removes entries (when an item is in category Y for its entire life) such that every item remains in category Y (and only Y) where specified, while category X membership is maintained when not overridden by category Y.
Does not affect memberships of categories A, B, C, Z, etc., which do not have override categories and are built separately, by completely different rules.
Note: It can be assumed that X membership covers the entire lifespan of the item before this procedure is called, so it is unnecessary to query any data outside this table.
Bonus credit: If for some reason there are two adjacent or overlapping memberships in for the same item in category Y, stitching them together into a single entry is appreciated, but not necessary.
Example:
- item_id category_id begin_date end_date
- 1 X 20080101 20090628
- 1 Y 20090101 20090131
- 1 Y 20090601 20090628
- 2 X 20080201 20080731
- 2 Y 20080201 20080731
Should become:
- item_id category_id begin_date end_date
- 1 X 20080101 20081231
- 1 Y 20090101 20090131
- 1 X 20090201 20090531
- 1 Y 20090601 20090628
- 2 Y 20080201 20080731
If it matters, this needs to work on SQL Server 2005 and SQL Server 2008