views:

349

answers:

4

I have stock quantity information in my database.
1 table, "stock", holds the productid (sku) along with the quantity and the filename from where it came.

The other table, "stockfile", contains all the processed filenames along with dates.

Now I need to get all the products with their latest stock quantity values.

This gives me ALL the products multiple times with all their stock quantity (resulting in 300.000 records)

SELECT stock.stockid, stock.sku, stock.quantity, stockfile.filename, stockfile.date
FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
ORDER BY stock.sku ASC

I already tried this:

SELECT * FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
GROUP BY sku
HAVING stockfile.date = MAX( stockfile.date )
ORDER BY stock.sku ASC

But it did not work

SHOW CREATE TABLE stock:

CREATE TABLE stock (
stockid bigint(20) NOT NULL AUTO_INCREMENT,
sku char(25) NOT NULL,
quantity int(5) NOT NULL,
creationdate datetime NOT NULL,
stockfileid smallint(5) unsigned NOT NULL,
touchdate datetime NOT NULL,
PRIMARY KEY (stockid)
) ENGINE=MyISAM AUTO_INCREMENT=315169 DEFAULT CHARSET=latin1

SHOW CREATE TABLE stockfile:

CREATE TABLE stockfile (
stockfileid smallint(5) unsigned NOT NULL AUTO_INCREMENT,
filename varchar(25) NOT NULL,
creationdate datetime DEFAULT NULL,
touchdate datetime DEFAULT NULL,
date datetime DEFAULT NULL,
begindate datetime DEFAULT NULL,
enddate datetime DEFAULT NULL,
PRIMARY KEY (stockfileid)
) ENGINE=MyISAM AUTO_INCREMENT=265 DEFAULT CHARSET=latin1

A: 

There are two common ways to accomplish this: a sub query or a self-join.

See this example of selecting the group-wise maximum at the MySQL site.

Edit, an example using a subquery:

SELECT stock.stockid, stock.sku, stock.quantity,
       stockfile.filename, stockfile.date
FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
WHERE stockfile.date = (SELECT MAX(date) FROM stockfile);
charstar
A: 
select *
from   stock
where  stockfileid in (
           select top 1 stockfileid
           from   stockfile
           order by date desc
       )
Rich
That won't work as is - you need to correlate the subquery. Currently the subquery will return only one row, and you're getting the stockfileid, not the date which is what you're really after.
OMG Ponies
This is not valid SQL for MySQL: http://dev.mysql.com/doc/refman/5.4/en/select.html
charstar
That too - need to use `LIMIT`, not `TOP`
OMG Ponies
I suppose that's what I get for being a SQL Server user...
Rich
So is the problem that each stock file doesn't contain a complete stock check so the latest stock number for a particular product might be in any of the files? The question doesn't seem entirely clear.
Rich
Indeed, not everything is in the newest stockfile. But even if it were I'd still need to get the latest stock quantity like this.
skerit
+2  A: 

Use:

SELECT DISTINCT s.stockid, 
       s.sku, 
       s.quantity, 
       sf.filename, 
       sf.date
  FROM STOCK s
  JOIN STOCKFILE sf ON sf.stockfileid = s.stockfileid
  JOIN (SELECT t.stockfileid,
               MAX(t.date) 'max_date'
          FROM STOCKFILE t
      GROUP BY t.stockfileid) x ON x.stockfileid = sf.stockfileid
                               AND x.max_date = sf.date
OMG Ponies
I'm afraid this still gives me all 315.000 records.
skerit
I updated to add the `DISTINCT`, because the most likely reason is the rows are being duplicated because of the JOINs.
OMG Ponies
Hmm, even with the distinct it still gives me all the duplicate records.
skerit
@Jelle: That's impossible - DISTINCT is a synonym for GROUP BY - all the rows returned will be unique in some way, even if it is one column value. The most likely reason is that you've got ties based on the maximum date.
OMG Ponies
I haven't defined any relations or ties in the database.
skerit
@skerit: ties == similar, have two or more matched records. not tie as in link
Hao
+1  A: 

This is an example of the frequently-asked "greatest-n-per-group" question that we see every week on StackOverflow. Follow that tag to see other similar solutions.

SELECT s.*, f1.*
FROM stock s
INNER JOIN stockfile f1
  ON (s.stockfileid = f1.stockfileid)
LEFT OUTER JOIN stockfile f2
  ON (s.stockfileid = f2.stockfileid AND f1.date < f2.date)
WHERE f2.stockfileid IS NULL;

If there are multiple rows in stockfile that have the max date, you'll get them both in the result set. To resolve this, you'd have to add some tie-breaker conditions into the join on f2.


Thanks for adding the CREATE TABLE info. That's very helpful when you're asking SQL questions.

I see from the AUTO_INCREMENT table options that you have 315k rows in stock and only 265 rows in stockfile. Your stockfile table is the parent in the relationship, and the stock table is the child, with a column stockfileid that references the primary key of stockfile.

So your original question was misleading. You want the latest row from stock, not the latest row from stockfile.

SELECT f.*, s1.*
FROM stockfile f
INNER JOIN stock s1
  ON (f.stockfileid = s1.stockfileid)
LEFT OUTER JOIN stock s2
  ON (f.stockfileid = s2.stockfileid AND (s1.touchdate < s2.touchdate
      OR s1.touchdate = s2.touchdate AND s1.stockid < s2.stockid))
WHERE s2.stockid IS NULL;

I'm assuming you want "latest" to be relative to touchdate, so if you want to use creationdate instead, you can do the edit.

I've added a term to the join so that it resolves ties. I know you said the dates are "practically unique" but as the saying goes, "one in a million is next Tuesday."


Okay, I think I understand what you're trying to do now. You want the most recent row per sku, but the date by which to compare them is in the referenced table stockfile.

SELECT s1.*, f1.*
FROM stock s1
JOIN stockfile f1 ON (s1.stockfileid = f1.stockfileid)
LEFT OUTER JOIN (stock s2 JOIN stockfile f2 ON (s2.stockfileid = f2.stockfileid))
  ON (s1.sku = s2.sku AND (f1.date < f2.date OR f1.date = f2.date AND f1.stockfileid < f2.stockfileid))
WHERE s2.sku IS NULL;

This does a self-join of stock to itself, looking for a row with the same sku and a more recent date. When none is found, then s1 contains the most recent row for its sku. And each instance of stock has to join to its stockfile to get the date.


Re comment about optimization: It's hard for me to test because I don't have tables populated with data matching yours, but I'd guess you should have the following indexes:

CREATE INDEX stock_sku ON stock(sku);
CREATE INDEX stock_stockfileid ON stock(stockfileid);
CREATE INDEX stockfile_date ON stockfile(date);

I'd suggest using EXPLAIN to analyze the query without the indexes, and then create one index at a time and re-analyze with EXPLAIN to see which one gives the most direct benefit.

Bill Karwin
I know, I've actually solved a similar problem thanks to stackoverflow. But your query still gives me 315.000 results :/
skerit
Then I would guess there are a lot of ties for the max date per stock.
Bill Karwin
A lot of ties? The "date" field is practically unique in the stockfile table.Per date there can be about 100 records in the stock table
skerit
Alright, SHOW CREATE TABLE output has been added.
skerit
Hmm, but I can't use the touchdate from the "stock" table, the only date that matters is the "date" field in stockfile.(touchdate and creationdate are fields to let me know when it was originally created and last edited.A stockfile from last year can be re-imported today, which would results in old records having the highest touchdate)
skerit
See my third try above.
Bill Karwin
skerit
Ah yes, unfortunately I forgot I already made those indexes. So without them it takes ages, with them it takes "only" 250 seconds. Maybe I'll better create a new question to not go off-topic too much?
skerit