views:

500

answers:

2

In my question about searching for date ranges I tried simplifying the problem and inadvertently posed a different and simpler problem.

Rather than complicate that question with an edit I am going to ask the problem I actually intended.

I have two tables Property and Booking. Bookings have a foreign key to Properties and also start and end date.

The user is searching for free slots and supplies a desired duration in days. They also supply a range of start dates that they are interested in. Therefore a search will be along the lines of: "Find me all properties I want a 3 day slot which starts anytime in May."

Now I can do this by: 1. Running 31 queries for each potential start day 2. Finding all bookings in May, condense them into one array of 31 booleans representing days and loop through looking for slots.

I assume (2) is more efficient in most cases. Is there any better algorithms? Is there a pure SQL solution.

I will be using Django and my dataset is small so I will probably be OK with a 'dumb' approuch but I am curious to know what the best algorithm looks like.

+3  A: 

Probably overkill for your application - but:

A relatively simple way of improving your searches at the expense of making the 'write' process more complicated, would be to change the Booking table to make it an 'Availability' table.

Add in a boolean column to indicate if the slot is free or booked (or better still put in the id of the customer who's booked it, and use 0 if the slot is free).

Start off with a single free slot, 1st Jan 2009 -> 31st Dec 20??

When you get a booking split the free slot into 3 (two inserts and one update), the booked slot and the two available slots.

Keep doing that and as the time frame gets more fragmented the booking process will consist of one of the following:

  • Assigning an entire 'available slot' to someone (one update)
  • Splitting an 'available slot' into two (one update and one insert)
  • Splitting a slot into 3 (as above) if someone books the middle section out of an available slot.

That's not incredibly complicated to manage and the search process becomes a simple query: finding any slots in the required time frame that are available (booked=false or customerid=0, whichever way you go with it) where enddate - startdate >= the number of days you want.

It doubles the size of the booking/availability table, and makes bookings less simple, but the trade off is that the search process is about as easy as it gets.

Stringent Software
Very clever approach but I need the booking data for many other purposes too. I could have an availability table for each property in addition to my current booking schema. This in a sense denormalizes my data to make one type of search easier but in this case may count as a premature optimization.
andybak
Your original booking data is still there - it just has an extra column with either a 'booked' boolean or customerid in it. If you search the Availability table with where booked=true or customerid>0 you have the same record set as in your original table. That's why it doubles the size of the table, it includes both booked and available slots.
Stringent Software
+1 Probably not the solution the OP is looking for but very neat.
Lieven
The more I think about this problem the more I realise that this is cleaner and simpler than every other solution I can think of!
andybak
+1  A: 

Table definitions would have helped, but here goes. This should work for MS SQL Server, but it should be a trivial task to convert it to MySQL once you understand the idea behind it.

The Calendar table is just a standard utility table with all of the dates in it that is useful to have in your database. If you don't already have one, I suggest that you create one and populate it.

CREATE TABLE Calendar
(
     date        DATETIME     NOT NULL,
     is_holiday  BIT          NOT NULL,
     -- any other columns that might be relevant for your business
     CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED (date)
)

You would then need to populate the table with any dates that might be meaningful for your business. Even if you go back 100 years and forward 100 years, that's still less than 75K rows in the table and it's clustered on the date, so it should be fast and easy to work with. It makes many date-based queries much simpler.

SELECT
     P.property_id,
     C.date
FROM
     Calendar C
JOIN Properties P ON 1=1
WHERE
     C.date BETWEEN @search_start_date AND @search_end_date AND
     NOT EXISTS
     (
          SELECT
               *
          FROM
               Bookings B
          WHERE
               B.property_id = P.property_id AND
               B.start_date <= DATEADD(dy, @slot_length, C.date) AND  -- You would use MySQLs date function
               B.end_date   >= C.date
     )

Or alternatively:

SELECT
     P.property_id,
     C.date
FROM
     Calendar C
JOIN Properties P ON 1=1
LEFT OUTER JOIN Bookings B ON
               B.property_id = P.property_id AND
               B.start_date <= DATEADD(dy, @slot_length, C.date) AND  -- You would use MySQLs date function
               B.end_date   >= C.date
WHERE
     C.date BETWEEN @search_start_date AND @search_end_date AND
     B.booking_id IS NULL
Tom H.
I'm not clear exactly what the Calendar table contains here?
andybak
It's just a table of dates. It's there so that you can treat dates in a set-based manner. I'll add some example DDL for it to the answer.
Tom H.
I have a horrible feeling I'm just not smart enough to understand this answer! What does the 'JOIN Properties P ON 1=1' line do?
andybak
You're getting a Cartesian product for the two tables. Basically, starting with one of every product for every possible date, since if a row DOESN'T exist in the Bookings then you want to include it. Once we have every date possible with every property possible, we take out all of the ones that are already booked.
Tom H.