views:

78

answers:

1

I'm working on integrating some data from a 3rd-Party system into one of my applications (legacy ASP Classic-based web application/SQL 2000) - they have made some poor decisions (IMHO) when it comes to their approach and data structure, though perhaps we might get a chance to refactor at some point... but until then, I have to work with what is in front of me.

The main table holds inspection data, with one of the fields being used to record if certain characteristics were observed. The characteristics are stored in a Table called Categories, but unfortunately, the main inspection table (Test) links to the categories by concatenating relevant CategoryIDs into a single field (SelectedCategories). So, for example, if characteristics 01 and 02 were observed, the SelectedCategories column for that row in Test would have a value of '01C02C'.

Trimmed DDL:

CREATE TABLE [dbo].[Test](
[ItemID] [varchar](255) NOT NULL,
[Result] [varchar](255) NULL,
[Comments] [varchar](255) NULL,
[ResultReason] [varchar](255) NULL,
[ImageLocation] [varchar](255) NULL,
[TestDateTime] [smalldatetime] NOT NULL,
[SelectedCategories] [varchar](255) NULL)

The question is, given the situation, how best can I appropriately extract the data from Test with a breakdown of the characteristics observed?

The output on the client I want is a table with the following columns: Test.PK, Test.Field2... Test.Fieldn, Categories.ID1, Categories.ID2, Categories.IDn

That's probably not quite clear enough - the first fields would be the usual suspects from Test, followed by a tick or cross (or some other visual indicator) for each of the categories in Categories.

Obviously, if this can be achieved in one query, so much the better in terms of efficiency and performance. However, I'm not sure how this would be achieved - how would you join the Categories table via SelectedCategories?

I could obviously simply report the SelectedCategories value and have the application parse the value. This could be hard-coded or more likely we would re-query Categories for each row in Test - though this would have performance implications. TBH, performance probably isn't an issue in this case, but just because you can get away with something, doesn't mean you should make a habit of it.

Equally, if I have an opportunity to refactor the 3rd party application, would I remove SelectedCategories column and add in a TestCategories table? Or would I hard-code each category as a series Bit columns. In all likelihood, the Categories won't change throughout the life of the system, but if they did, it means changes (albeit very minor) to both the DB and the application.

I hope I've explained it sufficiently clearly. In essence, I'm saying what is the best approach if I stick with the current system? And if I were to refactor, what different approach could I take?

Progress Update:

Largely thanks to Lieven, I've got thus far:

DML:

SELECT  c.ID, c.Category, t.FilterID, t.OperatorResult, t.SelectedCategories
FROM    dbo.Categories c
        inner JOIN dbo.Test t ON CHARINDEX(Cast(c.ID as varchar), t.SelectedCategories, 1) <> 0
order by FilterID, ID

Output:

ID   Category             FilterID   OperatorResult   SelectedCategories
4    Cracked Ceramic      137667     FAILED           04C
4    Cracked Ceramic      284821     FAILED           04C
4    Cracked Ceramic      287617     FAILED           04C05C
5    Damaged Case         287617     FAILED           04C05C
4    Cracked Ceramic      310112     FAILED           04C05C
5    Damaged Case         310112     FAILED           04C05C

This would suffice, except that in order to reach my desired screen output...

Filter ID  Operator Result    Cat Matl   Crack    Damage   High Soot   
137667     FAILED             X          X        
178643     FAILED           
284821     FAILED                        X        
287617     FAILED                                 X        X     
310112     FAILED                        X        X    

... I either need further work on the SQL (so that I can achieve the desired output in just one query) or I need to do some extra work in the application itself.

Conclusion:

If we look at Lieven's latest example (below), we can see that the the problem can be tackled in TSQL but that the Categories are hard-coded.

The alternative is stick to raw data and make IIS/ASP do more work. It will complicate the source code for certain, but will remove the potential overhead of updating the TSQL if a category is added or removed. I could certainly live with this very occasional need to update the TSQL, but I'm thinking ahead to different problems where the the Categories table would be actively changing on a regular basis.

+3  A: 

To join the Categories table via SelectedCategories could work like this

Edit

Some things to think about

  • While the crosstab functionality is possible with the group by and all, perhaps it would be better to handle this in a client application.
  • The inputs you've provided in your question do not match the outputs you've provided. I have used your inputs and assumed some things about the requested output. I assume Cat Matl and High Soot are just other possible values for categories.

Let us know if it worked for you.

BEGIN TRAN

CREATE TABLE [dbo].[Categories](
[CategorieID] INTEGER NOT NULL)

CREATE TABLE [dbo].[Test](
[ItemID] [varchar](255) NOT NULL,
[Result] [varchar](255) NULL,
[Comments] [varchar](255) NULL,
[ResultReason] [varchar](255) NULL,
[ImageLocation] [varchar](255) NULL,
[TestDateTime] [smalldatetime] NOT NULL,
[SelectedCategories] [varchar](255) NULL)

INSERT INTO dbo.Categories VALUES (4)
INSERT INTO dbo.Categories VALUES (5)


INSERT INTO dbo.Test VALUES (137667, 'FAILED', NULL, 'Cracked Ceramic', NULL, GetDate(), '04C')
INSERT INTO dbo.Test VALUES (284821, 'FAILED', NULL, 'Cracked Ceramic', NULL, GetDate(), '04C')
INSERT INTO dbo.Test VALUES (287617, 'FAILED', NULL, 'Cracked Ceramic', NULL, GetDate(), '04C05C')
INSERT INTO dbo.Test VALUES (287617, 'FAILED', NULL, 'Damaged Case'   , NULL, GetDate(), '04C05C')
INSERT INTO dbo.Test VALUES (310112, 'FAILED', NULL, 'Cracked Ceramic', NULL, GetDate(), '04C05C')
INSERT INTO dbo.Test VALUES (310112, 'FAILED', NULL, 'Damaged Case'   , NULL, GetDate(), '04C05C')

SELECT  [Filter ID] = t.ItemID
        , [Operator Result] = t.Result
        , [Reason] = t.ResultReason
INTO    #Output
FROM    dbo.Categories c                  
        LEFT OUTER JOIN dbo.Test t ON 
          /* Search for "C<{00}CategorieID>C" */
          CHARINDEX('C'                                                       -- Prefix CategorieID & SelectedCategories with 'C'
                      + REPLICATE('0', 2 - LEN(CAST(CategorieID AS VARCHAR))) -- Left Pad CategorieID with '0'
                      + CAST(CategorieID AS VARCHAR)                          -- Add CategorieID itself
                      + 'C'                                                   -- Suffix search string with 'C'.
                    , 'C' + t.SelectedCategories                              -- Prefix CategorieID & SelectedCategories with 'C'
                    , 1) <> 0

SELECT    [Filter ID]
          , [Operator Result]
          , [Cat Matl] = CASE WHEN [Cat Matl] = 1 THEN 'X' ELSE '' END
          , [Crack] = CASE WHEN [Crack] = 1 THEN 'X' ELSE '' END
          , [Damage] = CASE WHEN [Damage] = 1 THEN 'X' ELSE '' END
          , [High Soot] = CASE WHEN [High Soot] = 1 THEN 'X' ELSE '' END
FROM      (
            SELECT    [Filter ID]
                      , [Operator Result]
                      , [Cat Matl] = MAX(CASE WHEN Reason = 'Cat Matl' THEN 1 ELSE 0 END) 
                      , [Crack] = MAX(CASE WHEN Reason = 'Cracked Ceramic' THEN 1 ELSE 0 END) 
                      , [Damage] = MAX(CASE WHEN Reason = 'Damaged Case' THEN 1 ELSE 0 END) 
                      , [High Soot] = MAX(CASE WHEN Reason = 'High Soot' THEN 1 ELSE 0 END) 
            FROM      #Output
            GROUP BY  [Filter ID]
                      , [Operator Result]
          ) o

DROP TABLE #Output

ROLLBACK TRAN
Lieven
Complication 1: The ID in Categories is 1, 2, 3 etc. In SelectedCategories it is 01, 02 etc. Probably not an issue in practice, but if there were to be more than 10 categories it could be problematic.
CJM
Complication 2: Ideally, I'd like to return a row for every category, not just for the categories that were selected. Otherwise I'm still doing a lot of work in the application to get the desired output.
CJM
Change the inner join to a left outer join to tackle complication2. Complication1 is harder to tackle. I'll see what I can come up with.
Lieven
I've updated my post to reflect progress so far. Thanks for your efforts so far by the way!
CJM
I have updated the answer. I think it is pretty close to what you want.
Lieven
Apologies for the confusion, I swapped across to more realistic outputs, to better demonstrate my objective. I've actually done a (simpler) crosstab in TSQL before but it was a simpler scenario and the options were fixed. I think your solution takes me as far as I can go in SQL - I just need to decide if this is the best approach.
CJM
Thanks for your efforts, Lieven. 25 rep is the least you've deserved.
CJM