



I'm looking for a way to select until a sum is reached.

My "documents" table has "tag_id" and "size" fields.

I want to select all of the documents with tag_id = 26 but I know I can only handle 600 units of size. So, there's no point in selecting 100 documents and discarding 90 of them when I could have known that the first 10 already added up to > 600 units.

So, the goal is: don't bring back a ton of data to parse through when I'm going to discard most of it.

...but I'd also really like to avoid introducing working with cursors to this app.

I'm using mysql.

+6  A: 

You need some way to order which records get priority over others when adding up to your max units. Otherwise, how do you know which set of records that totals up to 600 do you keep?

SELECT, d.size, d.date_created
FROM documents d
INNER JOIN documents d2 ON d2.tag_id=d.tag_id AND d2.date_created >= d.date_created
WHERE d.tag_id=26
GROUP BY, d.size, d.date_created
HAVING sum(d2.size) <= 600
ORDER BY d.date_created DESC

This is just a basic query to get you started, and there are a number of problems still to solve:

  • It stops at <= 600, so in most cases you won't fill up your size limit exactly. This means you might want to tweak it to allow one more record. For example, if the first record is > 600 the query will return nothing, and that could be a problem.
  • It won't do anything to check for additional smaller records later on that might still fit under the cap.
  • Records with identical date_created values could be sort of 'double counted' here and there.

Updated since he added information that he's sorting by date.

Joel Coehoorn
I was starting to post something very similar, though using an auxiliary view. Yours is better.
Joe Pineda
That's more clever than my answer too. :)

This is much less efficient, but it does avoid a cursor (assuming your documents table also has a serial id column):

select, (select sum(b.size) from documents b where <= and b.tag_id = 26)
from documents a
where a.tag_id = 26
order by

Also, this was done in pgsql, so I'm not sure if this exact syntax would work in mysql.

Then you can wrap this in another query that looks for those having a sum > 600 (you'll have to name the sum column) and take the first id. Then process all ids below and including that one.

Er, if there's no id, then use the created timestamp.

You would have to first store the documents in a table variable, sort them in the order you want to retrieve them, then update each row with a cumulative value so that you can select on it.

 declare @documents_temp table (
    tag_id int,
    size int,
    cumulative_size int null)

insert into @documents_temp
select tag_id, size, size from documents order by tag_id

update @documents_temp d set d.cumulative_size = d.size + 
 (select top 1 cumulative_size from @documents_temp 
     where tag_id < d.tag_id order by tag_id desc)

select tag_id, size from @documents_temp where cumulative_size <= 600

Don't know if it is worth it.

Patrick Szalapski