views:

345

answers:

2

Hey there,

I have a CATEGORIES table that links to an ITEMS table (one to many). Each item table could have multiple barcodes in the BARCODES table (one to many).

The idea is that a new barcode may be added against an item at some point, but the old data is stored in the BARCODES table so that if a search is done with an order with last year's data, the item and category of the old barcode can still be found quickly.

Hope that makes sense - it's a bit of an oversimplification.

Example of my tables:

CATEGORIES TABLE
ID   NAME        CODE    ACCOUNTING_REFERENCE
1    Beverages   BEV     Stock_Beverages
2    Pies        PIE     Stock_Pies
3    Chips       CHP     Stock_Chips

ITEMS TABLE
ID   CATEGORY_ID  DESCRIPTION                 BASE_COST
1    1            Red Bull (single)           $4.50
2    2            Ponsonby Pie - Mince Cheese $2.99
3    1            Coke Can (single)           $3.50
4    2            Big Ben - Steak Pepper      $1.99

BARCODES TABLE
ID  ITEM_ID  BARCODE   ACTIVE_FROM
1   1        XSD123    2009/10/11
2   2        AXF123    2009/10/12
3   3        XYZ234    2009/10/11
4   1        NEW001    2010/01/05
5   1        NEW002    2010/01/05*
  • I know it doesn't make sense in this scenario to have two barcodes entered on the same day. Ignore that this looks like bad data. This example is just an analogy for what I'm actually doing - in the real world application that mirrors this structure, it makes sense to have two 'BARCODES' entered on the same day. I'd like to discuss the real data openly, but I'm not free to do so - I'm sorry if it's confusing.

I am attempting to display item information to the user. I want to return from SQL one record per item, for all items. The returned information should include category name, code, and accounting references for the lines - duplicates of these columns are fine. The query needs to return item description and base cost.

This much I can do.

I also want the query to return the most recent barcode for the item. In the cases where there are two barcodes entered for an item on the same day, it isn't hugely important which one I display - although displaying the higher of the two (or more) BARCODE.ID fields would be a nice touch.

Also, it is possible that items may exist without any barcodes against them, in which case I want to return a null entry.

My desired result set would look something like this:

ITEM_ID   DESCRIPTION                  BASE_COST  CATEGORY_NAME CATEGORY_CODE  ACCOUNTING_REFERENCE  BARCODE
1         Red Bull (single)            $4.50      Beverages     BEV            Stock_Beverages       NEW002
2         Ponsonby Pie - Mince Cheese  $2.99      Pies          PIE            Stock_Pies            AXF123
3         Coke Can (single)            $3.50      Beverages     BEV            Stock_Beverages       XYZ234
4         Big Ben - Steak Pepper       $1.99      Pies          PIE            Stock_Pies            <null>

I can't work out how to add the BARCODE column to this result set, given the tables above.

In case I haven't been clear: I need to know how to structure a SQL query that will give me exactly the result above, given the data I've presented.

It's a requirement that the ITEM_ID column in my result set be distinct. I can't just restrict the stale entries in memory, and if I bubble up multiple ITEM_ID's it breaks a PK/FK relationship used elsewhere in the application I'm touching up.

For the record, I've had a look around the site on anything related to distinct sql columns to answer this question. I found a lot of entries, but couldn't seem to get any of the suggested solutions to work for me. Perhaps I'm just dense - it's getting late, and I may not be thinking straight. Apologies if I've missed something obvious.

EDIT

Gabe gave a good answer below, and I realized I should have been clearer.

Gabe's answer doesn't work for me, because occasionally in my data I have two barcodes against the same item with the same timestamp. When I tried his code, I wound up with multiple items being returned whenever an item had two barcodes placed against it on the same day.

This is really counter-intuitive, so it's my fault for not communicating properly. Essentially, the scenario described above isn't actually the data I'm working from. I'm not at liberty to discuss the database I'm working on publicly, so I have to rename everything in the examples I use.

I've adjusted the problem above. I know it seems ridiculous to have two barcodes entered with the same timestamp - rest assured that it makes sense that this could happen in the actual database.

EDIT (again)

The answers of both Gabe and simon work. I chose Gabe's as the answer, simply because I found his statement the most legible.

That said, I also like simon's because it shows a syntax for using SQL that I'm not familiar with - it's good to see an unfamiliar syntax in action.

At some point I need to benchmark the two methods to see which is faster. With my sample data, they're currently equal - although I expect I could change that with a bit of work populating my database with a few thousand more records and revising my indexes.

Thanks for the help.

+2  A: 

It sounds like you want a correlated subquery. Roughly, this:

 select *
from ITEMS
join BARCODES as B on ITEMS.ID = B.ITEM_ID
join CATEGORIES on CATEGORIES.ID = ITEMS.CATEGORY_ID
where B.ACTIVE_FROM =
    (select max(ACTIVE_FROM)
     from BARCODES as B2
     where B2.ITEM_ID = B.ITEM_ID)

To handle the situation where there is no barcode for a given item you need an outer join and to give only one barcode when there are many you need an additional subquery. In ANSI SQL, it might look something like this:

 select *
from ITEMS
LEFT OUTER join BARCODES as B on ITEMS.ID = B.ITEM_ID
join CATEGORIES on CATEGORIES.ID = ITEMS.CATEGORY_ID
where B.ACTIVE_FROM =
    (select max(ACTIVE_FROM)
     from BARCODES as B2
     where B2.ITEM_ID = B.ITEM_ID)
    AND B.ID =
    (SELECT MAX(ID)
     FROM BARCODES AS B3
     WHERE B3.ITEM_ID = B.ITEM_ID)
    OR ACTIVE_FROM IS NULL
Gabe
Sorry Gabe - doesn't quite work, and I've updated the example above to explain why. I should've been clearer. With the updated example, this query gives me two hits for Red Bull, and I need to restrict it to one. It doesn't really matter which barcode I return if they're entered on the same day, but if I have the option then the higher ITEM ID would be a nice touch. Thanks for the time you took to answer all the same.
Ubiquitous Che
I changed the second query to accomodate the new requirement.
Gabe
Perfect. Thanks heaps.
Ubiquitous Che
+1  A: 

Try something like this:

use tempdb
go
if exists (select 1 from sys.objects where name = 'barcodes')
    drop table barcodes
if exists (select 1 from sys.objects where name = 'items')
    drop table items
if exists (select 1 from sys.objects where name = 'categories')
    drop table categories
go
create table categories (
    id int primary key,
    [name] nvarchar(30),
    code char(3),
    accounting_reference nvarchar(30)
)
create table items (
    id int primary key,
    category_id int foreign key references categories (id),
    description nvarchar(50),
    base_cost money
)
create table barcodes (
    id int primary key,
    item_id int foreign key references items (id),
    barcode varchar(10),
    active_from datetime
)
go
insert into categories (id, [name], code, accounting_reference)
select 1, 'Beverages', 'BEV', 'Stock_Beverages' union all
select 2, 'Pies', 'PIE', 'Stock_Pies' union all
select 3, 'Chips', 'CHP', 'Stock_Chips'

insert into items (id, category_id, description, base_cost)
select 1, 1, 'Red Bull (single)', 4.5 union all
select 2, 2, 'Ponsonby Pie - Mince Cheese', 2.99 union all
select 3, 1, 'Coke Can (single)', 3.50 union all
select 4, 2, 'Big Ben - Steak Pepper', 1.99

insert into barcodes (id, item_id, barcode, active_from)
select 1, 1, 'XSD123', '2009/10/11' union all
select 2, 2, 'AXF123', '2009/10/12' union all
select 3, 3, 'XYZ234', '2009/10/11' union all
select 4, 1, 'NEW001', '2010/01/05' union all
select 5, 1, 'NEW002', '2010/01/05'

;with x as (
    select item_id, max(active_from) active_from, max(id) id
    from barcodes
    group by item_id
),
y as (
    select item_id, barcode
    from barcodes
    where exists (select 1 from x where item_id = barcodes.item_id and id = barcodes.id and active_from = barcodes.active_from)
)
select t1.id item_id, t1.description, t1.base_cost, t2.name category_name, t2.code category_code, t2.accounting_reference, t3.barcode
from items t1 left join categories t2 on (t1.category_id = t2.id)
    left join y t3 on (t1.id = t3.item_id)
_simon_
This also works - the SQL syntax melts my brain, but that's just because I'm unfamiliar with it. Thanks for showing me this syntax - it's something I'm going to have to study a bit so that I can understand it. Thanks muchly.
Ubiquitous Che