I have a large SQL server table that looks something like this:
ImageId int Page int FSPath varchar(256) ImageFrame int ...
The table stores an entry for each page of a number of image files. This is done to enable the table to represent images where each page is represented by a different file, and multi-page image files that contain the pages within the same file. When I am dealing with a multi-page setup, the value of the FSPath column is exactly duplicated for each page within the same document which is eating up a lot of space (this table alone is currently ~5GB). It seems very wasteful to be duplicating the data in this way, but I haven't been able to find an alternate solution that I'm satisfied with.
The usage pattern for this table is dominated by lookups based on the primary key (ImageId/Page) for the path (and other columns) but I also need to be able to efficiently handle insertion of new data and occasional deletion.
If I create a lookup table for the path elements and insert an path element id into the pages table I would need to index it both by the Id and by the path, which would hurt the scenario where there is a distinct path piece for each page, and complicate the insertion of new data where the path may or may not exist in the lookup table. Furthermore, deletion of any row in the main pages table would require that I clean up the associated path entry if it is no longer used.
I had been hoping that I could create an updateable view of the joined tables and let SQL Server do the magic for me, but I get the message: View or function 'Scrap.dbo.PageView' is not updatable because the modification affects multiple base tables. Trying to perform an insert.
Is there a reasonable way to do this that I'm just missing, or am I out of luck?