I'm working on integrating some data from a 3rd-Party system into one of my applications (legacy ASP Classic-based web application/SQL 2000) - they have made some poor decisions (IMHO) when it comes to their approach and data structure, though perhaps we might get a chance to refactor at some point... but until then, I have to work with what is in front of me.
The main table holds inspection data, with one of the fields being used to record if certain characteristics were observed. The characteristics are stored in a Table called Categories
, but unfortunately, the main inspection table (Test
) links to the categories by concatenating relevant CategoryID
s into a single field (SelectedCategories
). So, for example, if characteristics 01 and 02 were observed, the SelectedCategories
column for that row in Test
would have a value of '01C02C'.
Trimmed DDL:
CREATE TABLE [dbo].[Test](
[ItemID] [varchar](255) NOT NULL,
[Result] [varchar](255) NULL,
[Comments] [varchar](255) NULL,
[ResultReason] [varchar](255) NULL,
[ImageLocation] [varchar](255) NULL,
[TestDateTime] [smalldatetime] NOT NULL,
[SelectedCategories] [varchar](255) NULL)
The question is, given the situation, how best can I appropriately extract the data from Test
with a breakdown of the characteristics observed?
The output on the client I want is a table with the following columns: Test.PK, Test.Field2... Test.Fieldn, Categories.ID1, Categories.ID2, Categories.IDn
That's probably not quite clear enough - the first fields would be the usual suspects from Test
, followed by a tick or cross (or some other visual indicator) for each of the categories in Categories
.
Obviously, if this can be achieved in one query, so much the better in terms of efficiency and performance. However, I'm not sure how this would be achieved - how would you join the Categories
table via SelectedCategories
?
I could obviously simply report the SelectedCategories
value and have the application parse the value. This could be hard-coded or more likely we would re-query Categories
for each row in Test - though this would have performance implications. TBH, performance probably isn't an issue in this case, but just because you can get away with something, doesn't mean you should make a habit of it.
Equally, if I have an opportunity to refactor the 3rd party application, would I remove SelectedCategories
column and add in a TestCategories
table? Or would I hard-code each category as a series Bit columns. In all likelihood, the Categories
won't change throughout the life of the system, but if they did, it means changes (albeit very minor) to both the DB and the application.
I hope I've explained it sufficiently clearly. In essence, I'm saying what is the best approach if I stick with the current system? And if I were to refactor, what different approach could I take?
Progress Update:
Largely thanks to Lieven, I've got thus far:
DML:
SELECT c.ID, c.Category, t.FilterID, t.OperatorResult, t.SelectedCategories
FROM dbo.Categories c
inner JOIN dbo.Test t ON CHARINDEX(Cast(c.ID as varchar), t.SelectedCategories, 1) <> 0
order by FilterID, ID
Output:
ID Category FilterID OperatorResult SelectedCategories
4 Cracked Ceramic 137667 FAILED 04C
4 Cracked Ceramic 284821 FAILED 04C
4 Cracked Ceramic 287617 FAILED 04C05C
5 Damaged Case 287617 FAILED 04C05C
4 Cracked Ceramic 310112 FAILED 04C05C
5 Damaged Case 310112 FAILED 04C05C
This would suffice, except that in order to reach my desired screen output...
Filter ID Operator Result Cat Matl Crack Damage High Soot
137667 FAILED X X
178643 FAILED
284821 FAILED X
287617 FAILED X X
310112 FAILED X X
... I either need further work on the SQL (so that I can achieve the desired output in just one query) or I need to do some extra work in the application itself.
Conclusion:
If we look at Lieven's latest example (below), we can see that the the problem can be tackled in TSQL but that the Categories are hard-coded.
The alternative is stick to raw data and make IIS/ASP do more work. It will complicate the source code for certain, but will remove the potential overhead of updating the TSQL if a category is added or removed. I could certainly live with this very occasional need to update the TSQL, but I'm thinking ahead to different problems where the the Categories table would be actively changing on a regular basis.