views:

96

answers:

3

Let's suppose that I have these tables:

[ properties ]
 id (INT, PK)
 name (VARCHAR)

[ properties_prices ]
 id (INT, PK)
 property_id (INT, FK)
 date_begin (DATE)
 date_end (DATE)
 price_per_day (DECIMAL)
 price_per_week (DECIMAL)
 price_per_month (DECIMAL)

And my visitor runs a search like: List the first 10 (pagination) properties where the price per day (price_per_day field) is between 10 and 100 on the period for 1st may until 31 december

I know thats a huge query, and I need to paginate the results, so I must do all the calculation and login in only one query... that's why i'm here! :)

Questions about the problem

If there are gaps, would that be an acceptable property?

There're no gaps. All the possible dates are in the database.

If the price is between 10 and 100 in some sup-periods, but not in others, do you want to get that property?

In the perfect world, no... We'll need to calculate the "sum" of that type of price in that period considering all the variations/periods.

Also, what are the "first 10"? How are they ordered? Lowest price first? But there could be more than one price.

This is just an example of pagination with 10 results per page... Can be ordered by the FULLTEXT search that I'll add with keywords and these things... As I said, it's a pretty big query.

+1  A: 

What you tell us is not precise enough. From your data structure and your question I assume:

  • the price of a property can change in that period, and there would be a properties_price entry for each sub-period
  • there should be no overlaps in the sub-periods, but the data structure does not guarantee that
  • there can be gaps in the sub-periods

But there are still questions:

  • If there are gaps, would that be an acceptable property?
  • If the price is between 10 and 100 in some sup-periods, but not in others, do you want to get that property?
  • Also, what are the "first 10"? How are they ordered? Lowest price first? But there could be more than one price.

Depending on the answers, there might be no single query doing the trick. But if you accept the gaps, that could return what you want:

SELECT *
FROM properties AS p
WHERE EXISTS          -- property is available in the price range
     (SELECT * FROM properties_prices AS pp1 
      WHERE p.id = pp1.property_id AND
            pp1.price_per_day between 10 and 100 AND
            (pp1.date_begin <= "2010-12-31" OR pp1.date_end >= "2010-05-01")) AND
      NOT EXISTS      -- property is in the price range in all sup-periods, but there might be gaps
     (SELECT * FROM properties_prices AS pp2 
      WHERE p.id = pp2.property_id AND
            pp2.price_per_day not between 10 and 100 AND
            (pp2.date_begin <= "2010-12-31" OR pp2.date_end >= "2010-05-01"))
ORDER BY name  --- ???
LIMIT 10  

That query doesn't give you the prices or other details. That would need to go in an extra query. But perhaps my assumptions are all wrong anyway.

Sam
We're getting there.. Your query's great. I answered your questions on the main post.
TiuTalk
+1  A: 

This can also be done as a GROUP BY, which I think will be quite efficient, and we get some aggregates as part of the package:

SELECT 
   prperty_id, MIN(price_per_day), MAX(price_per_day)
FROM 
   properties_prices 
WHERE 
   date_begin <= "2010-12-31" AND date_end >= "2010-05-01"
GROUP BY 
   property_id
HAVING MIN(IF( (price_per_day BETWEEN 10 AND 100), 1, 0))=1
ORDER BY ...
LIMIT 10

(I don't have MySQL to hand so I haven't tested. I was unsure about the MIN(IF ...) but a mock-up using a CASE worked on SQLServer.)

mdma
date_begin <= "2010-12-31" OR date_end >= "2010-05-01" s/b "AND"
jaywon
thanks for spotting this. I paraphrased from the sql server version I wrote - I must have paraphrased incorrectly!
mdma
+2  A: 

This is similar to the answer given by @mdma, but I use a condition in the join clause for the price range, instead of the HAVING trick.

SELECT p.id, MAX(p.name), 
  MIN(v.price_per_day) AS price_low,
  MAX(v.price_per_day) AS price_high
FROM properties p
JOIN properties_prices v ON p.id = v.property_id
  AND v.price_per_day BETWEEN 10 AND 100  
  AND v.date_begin < '2010-12-31' AND v.date_end > '2010-05-01'
GROUP BY p.id
ORDER BY ...
LIMIT 10;

I would also recommend creating a covering index:

CREATE INDEX prices_covering ON properties_prices
  (property_id, price_per_day, date_begin, date_end);

This allows your query to run as optimally as possible, because it can read the values directly from the index. It won't have to read the rows of data from the table at all.

+----+-------------+-------+-------+-----------------+-----------------+---------+-----------+------+--------------------------+
| id | select_type | table | type  | possible_keys   | key             | key_len | ref       | rows | Extra                    |
+----+-------------+-------+-------+-----------------+-----------------+---------+-----------+------+--------------------------+
|  1 | SIMPLE      | p     | index | PRIMARY         | PRIMARY         | 4       | NULL      |    1 |                          |
|  1 | SIMPLE      | v     | ref   | prices_covering | prices_covering | 4       | test.p.id |    6 | Using where; Using index |
+----+-------------+-------+-------+-----------------+-----------------+---------+-----------+------+--------------------------+
Bill Karwin