tags:

views:

443

answers:

5

I have a table where I store customer sales (on periodicals, like newspaper) data. The product is stored by issue. Example

custid  prodid  issue   qty     datesold
1       123     2       12      01052008
2       234     1       5       01022008
1       123     1       5       01012008
2       444     2       3       02052008

How can I retrieve (whats a faster way) the get last issue for all products, for a specific customer? Can I have samples for both SQL Server 2000 and 2005? Please note, the table is over 500k rows.

Thanks

+3  A: 

Generic SQL; SQL Server's syntax shouldn't be much different:

SELECT prodid, max(issue) FROM sales WHERE custid = ? GROUP BY prodid;
Ant P.
Just change the ? to the @CustID format and you've got it.
Joel Coehoorn
If it is too slow then add an index on custid, prodid and issue. In that order I think.
pipTheGeek
A: 

To query on existing growing historical table is way too slow!

Strongly suggest you create a new table tblCustomerSalesLatest which stores the last issue data of each customer. and select from there.

codemeit
This "optimization" will result in a database schema that is no longer normalized. Premature optimization is the root of all evils. With appropriate indexes the "max/group by" answer should be fine.
Diomidis Spinellis
It looks like an olap analytical not an oltp transactional database, in which case it must be denormalized.
le dorfier
By use case! That is what customer paid for. The clean solution is not 100% normalized but fast and practical.
codemeit
And what if tblCustomerSales has 100 million records.??
codemeit
I was considering this as an alternative, the problem is that this will take a while to implement.
Saif Khan
+2  A: 

Is this a new project? If so, I would be wary of setting up your database like this and read up a bit on normalization, so that you might end up with something like this:

CustID LastName FirstName
------ -------- ---------
1      Woman    Test
2      Man      Test

ProdID ProdName
------ --------
123    NY Times
234    Boston Globe

ProdID IssueID PublishDate
------ ------- -----------
123    1       12/05/2008
123    2       12/06/2008

CustID OrderID OrderDate
------ ------- ---------
1      1       12/04/2008

OrderID ProdID IssueID Quantity
------- ------ ------- --------
1       123    1       5
2       123    2       12

I'd have to know your database better to come up with a better schema, but it sound like you're building too many things into a flat table, which will cause lots of issues down the road.

Patrick Harrington
This is 95% to my my schema. Your last table is where I need to get the last issue for every product, for a customer
Saif Khan
I would appreciate your feedback.
Saif Khan
+1  A: 

If you're looking for most recent sale by date maybe that's what you need:

SELECT prodid, issue
  FROM Sales 
WHERE custid = @custid 
      AND datesold = SELECT MAX(datesold) 
                       FROM Sales s 
                      WHERE s.prodid = Sales.prodid
                         AND s.issue = Sales.issue
                        AND s.custid = @custid
Mr. Brownstone
+3  A: 

Assuming that "latest" is determined by date (rather than by issue number), this method is usually pretty fast, assuming decent indexes:

SELECT
     T1.prodid,
     T1.issue
FROM
     Sales T1
LEFT OUTER JOIN dbo.Sales T2 ON
     T2.custid = T1.custid AND
     T2.prodid = T1.prodid AND
     T2.datesold > T1.datesold
WHERE
     T1.custid = @custid AND
     T2.custid IS NULL

Handling 500k rows is something that a laptop can probably handle without trouble, let alone a real server, so I'd stay clear of denormalizing your database for "performance". Don't add extra maintenance, inaccuracy, and most of all headaches by tracking a "last sold" somewhere else.

EDIT: I forgot to mention... this doesn't specifically handle cases where two issues have the same exact datesold. You might need to tweak it based on your business rules for that situation.

Tom H.