views:

113

answers:

3

My goal is to return a start and end date having same value in a column. Here is my table. The (*) have been marked to give you the idea of how I want to get "EndDate" for every similar sequence value of A & B columns

ID | DayDate   |  A  |  B
-----------------------------------------------
1  | 2010/07/1 | 200 |  300
2  | 2010/07/2 | 200 |  300 *
3  | 2010/07/3 | 150 |  250
4  | 2010/07/4 | 150 |  250 *
8  | 2010/07/5 | 150 |  350 *
9  | 2010/07/6 | 200 |  300
10 | 2010/07/7 | 200 |  300 *
11 | 2010/07/8 | 100 |  200
12 | 2010/07/9 | 100 |  200 *

and I want to get the following result table from the above table

| DayDate   |EndDate   |  A  |  B
-----------------------------------------------
| 2010/07/1 |2010/07/2 | 200 |  300
| 2010/07/3 |2010/07/4 | 150 |  250
| 2010/07/5 |2010/07/5 | 150 |  350
| 2010/07/6 |2010/07/7 | 200 |  300
| 2010/07/8 |2010/07/9 | 100 |  200

UPDATE:

Thanks Mike, The approach of yours seems to work in your perspective of considering the following row as a mistake.

8  | 2010/07/5 | 150 |  350 * 

However it is not a mistake. The challenge I am faced with this type of data is like a scenario of logging a market price change with date. The real problem in mycase is to select all rows with the beginning and ending date if both A & B matches in all these rows. Also to select the rows which are next to previously selected, and so on like that no data is left out in the table.

I can explain a real world scenario. A Hotel with Room A and B has room rates for each day entered in to table as explained in my question. Now the hotel needs to get a report to show the price calendar in a shorter way using start and end date, instead of listing all the dates entered. For example, on 2010/07/01 to 2010/07/02 the price of A is 200 and B is 300. This price is changed from 3rd to 4th and on 5th there is a different price only for that day where the Room B is price is changed to 350. So this is considered as a single day difference, thats why start and end dates are same.

I hope this explained the scenario of the problem. Also note that this hotel may be closed for a specific time period, lets say this is an additional problem to my first question. The problem is what if the rate is not entered on specific dates, for example on Sundays the hotel do not sell these two rooms so they entered no price, meaning the row will not exist in the table.

+1  A: 
  • My general approach is to join the table onto itself based on DayDate = DayDate+1 and the A or B values not being equal
  • This will find the end dates for each period (where the value is going to be different on the following day)
  • The only problem is, that won't find an end date for the final period. To get around this, I selct the max date from the table and union that into my list of end dates
  • Once you have the list of end dates defined, you can join them to the original table based on the end date being greater than or equal to the original date
  • From this final list, select the minimum daydate grouped by the other fields

    select
    min(DayDate) as DayDate,EndDate,A,B from
    (SELECT DayDate, A, B, min(ends.EndDate) as EndDate
    FROM yourtable
    LEFT JOIN
    (SELECT max(DayDate) as EndDate FROM yourtable UNION
    SELECT t1.DayDate as EndDate 
    FROM yourtable t1
    JOIN yourtable t2
    ON date_add(t1.DayDate, INTERVAL 1 DAY) = t2.DayDate 
    AND (t1.A<>t2.A OR t1.B<>t2.B)) ends
    ON ends.EndDate>=DayDate
    GROUP BY DayDate, A, B) x
    GROUP BY EndDate,A,B
    
Rob Cooney
I tested this in MySQL this morning and fixed a couple things. This now produces the exact result as the example in the original problem.
Rob Cooney
A: 

I think I have found a solution which does produce the table desired.

SELECT  
  a.DayDate AS StartDate,  

  ( SELECT b.DayDate  
    FROM Dates AS b  
    WHERE b.DayDate > a.DayDate AND (b.B = a.B OR b.B IS NULL)  
    ORDER BY b.DayDate ASC LIMIT 1 
  ) AS StopDate,
a.A as A,
    a.B AS B

FROM Dates AS a 
WHERE Coalesce( 
               (SELECT c.B  
                FROM Dates AS c  
                WHERE c.DayDate <= a.DayDate  
                ORDER BY c.DayDate DESC LIMIT 1,1  
               ), -99999  
              ) <> a.B 
  AND a.B IS NOT NULL 
ORDER BY a.DayDate ASC; 

is able to generate the following table result

StartDate   StopDate    A   B
2010-07-01  2010-07-02  200 300
2010-07-03  2010-07-04  150 250
2010-07-05  NULL        150 350
2010-07-06  2010-07-07  200 300
2010-07-08  2010-07-09  100 200

But I need a way to replace the NULL with the same date of the start date.

Raf
you could wrap the final stopdate field in an IF check, something like:IF(stopdate is null, startdate, stopdate) as stopdate
Rob Cooney
+2  A: 

Creating related tables allows you much greater freedom to query and extract relevant information. Here's a few links that you might find useful:

You could start with these tutorials:
http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html
http://net.tutsplus.com/tutorials/databases/sql-for-beginners/

There are also a couple of questions here on stackoverflow that might be useful:
http://stackoverflow.com/questions/2331838/normalization-in-plain-english
http://stackoverflow.com/questions/1102590/what-exactly-does-normalization-do

Anyway, on to a possible solution. The following examples use your hotel rooms analogy.

First, create a table to hold information about the hotel rooms. This table just contains the room ID and its name, but you could store other information in here, such as the room type (single, double, twin), its view (ocean front, ocean view, city view, pool view), and so on:

CREATE TABLE `room` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(45) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE INDEX `name_UNIQUE` (`name` ASC) )
ENGINE = InnoDB;

Now create a table to hold the changing room rates. This table links to the room table through the room_id column. The foreign key constraint prevents records being inserted into the rate table which refer to rooms that do not exist:

CREATE TABLE `rate` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  `room_id` INT UNSIGNED NOT NULL,
  `date` DATE NOT NULL,
  `rate` DECIMAL(6,2) UNSIGNED NOT NULL,
  PRIMARY KEY (`id`),
  INDEX `fk_room_rate` (`room_id` ASC),
  CONSTRAINT `fk_room_rate`
    FOREIGN KEY (`room_id` )
    REFERENCES `room` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

Create two rooms, and add some daily rate information about each room:

INSERT INTO `room` (`id`, `name`) VALUES (1, 'A'), (2, 'B');

INSERT INTO `rate` (`id`, `room_id`, `date`, `rate`) VALUES
( 1, 1, '2010-07-01', 200),
( 2, 1, '2010-07-02', 200),
( 3, 1, '2010-07-03', 150),
( 4, 1, '2010-07-04', 150),
( 5, 1, '2010-07-05', 150),
( 6, 1, '2010-07-06', 200),
( 7, 1, '2010-07-07', 200),
( 8, 1, '2010-07-08', 100),
( 9, 1, '2010-07-09', 100),
(10, 2, '2010-07-01', 300),
(11, 2, '2010-07-02', 300),
(12, 2, '2010-07-03', 250),
(13, 2, '2010-07-04', 250),
(14, 2, '2010-07-05', 350),
(15, 2, '2010-07-06', 300),
(16, 2, '2010-07-07', 300),
(17, 2, '2010-07-08', 200),
(18, 2, '2010-07-09', 200);

With that information stored, a simple SELECT query with a JOIN will show you the all the daily room rates:

SELECT
    room.name,
    rate.date,
    rate.rate
FROM room
JOIN rate
ON rate.room_id = room.id;

+------+------------+--------+
| A    | 2010-07-01 | 200.00 |
| A    | 2010-07-02 | 200.00 |
| A    | 2010-07-03 | 150.00 |
| A    | 2010-07-04 | 150.00 |
| A    | 2010-07-05 | 150.00 |
| A    | 2010-07-06 | 200.00 |
| A    | 2010-07-07 | 200.00 |
| A    | 2010-07-08 | 100.00 |
| A    | 2010-07-09 | 100.00 |
| B    | 2010-07-01 | 300.00 |
| B    | 2010-07-02 | 300.00 |
| B    | 2010-07-03 | 250.00 |
| B    | 2010-07-04 | 250.00 |
| B    | 2010-07-05 | 350.00 |
| B    | 2010-07-06 | 300.00 |
| B    | 2010-07-07 | 300.00 |
| B    | 2010-07-08 | 200.00 |
| B    | 2010-07-09 | 200.00 |
+------+------------+--------+

To find the start and end dates for each room rate, you need a more complex query:

SELECT 
    id,
    room_id,
    MIN(date) AS start_date,
    MAX(date) AS end_date,
    COUNT(*) AS days,
    rate
FROM (
    SELECT
        id,
        room_id,
        date,
        rate, 
        (
            SELECT COUNT(*)
            FROM rate AS b
            WHERE b.rate <> a.rate
            AND b.date <= a.date
            AND b.room_id = a.room_id
        ) AS grouping
    FROM rate AS a
    ORDER BY a.room_id, a.date
) c
GROUP BY rate, grouping
ORDER BY room_id, MIN(date);

+----+---------+------------+------------+------+--------+
| id | room_id | start_date | end_date   | days | rate   |
+----+---------+------------+------------+------+--------+
|  1 |       1 | 2010-07-01 | 2010-07-02 |    2 | 200.00 |
|  3 |       1 | 2010-07-03 | 2010-07-05 |    3 | 150.00 |
|  6 |       1 | 2010-07-06 | 2010-07-07 |    2 | 200.00 |
|  8 |       1 | 2010-07-08 | 2010-07-09 |    2 | 100.00 |
| 10 |       2 | 2010-07-01 | 2010-07-02 |    2 | 300.00 |
| 12 |       2 | 2010-07-03 | 2010-07-04 |    2 | 250.00 |
| 14 |       2 | 2010-07-05 | 2010-07-05 |    1 | 350.00 |
| 15 |       2 | 2010-07-06 | 2010-07-07 |    2 | 300.00 |
| 17 |       2 | 2010-07-08 | 2010-07-09 |    2 | 200.00 |
+----+---------+------------+------------+------+--------+

You can find a good explanation of the technique used in the above query here:
http://www.sqlteam.com/article/detecting-runs-or-streaks-in-your-data

Mike
Raf
and thanks for the great links, they are very useful.
Raf
I'd be interested to know why this was downvoted.
Mike
@Raf: You could add a `rate_type` column to the `rate` table, and use this to differentiate between the single, double and triple rates.
Mike
This really have worked! I have managed to bring few changes to your approach to work with my database table and it works great. Thank you so much.
Raf
I'm glad you've got it working. Just a note for any future questions: rather than attempting to simplify things by reducing the amount of information you provide, you should be sure to provide as much relevant information as possible. In this particular case, your question makes it appear as though all your data is stored in one table, and that's what I and Rob Cooney based our answers on. You should update your question to reflect this, so that others looking for similar topics will be able to more easily find your question.
Mike
@Raf: Don't worry about the vote - a working solution is what counts. I was only curious before about why it had been downvoted, as this usually suggests that there is something wrong with the answer.
Mike