views:

44

answers:

3

Hi,

I'm trying to write a query to get a courses information and the number of bookings and attendees. Each course can have many bookings and each booking can have many attendees.

We already have a working report, but it uses multiple queries to get the required information. One to get the courses, one to get the bookings, and one to get the number of attendees. This is very slow because of the size that the database has grown to.

There are a number of extra conditions for the reports:

  • Bookings must be made more than 5 minutes ago, or have been confirmed
  • The booking must not be canceled
  • The course must not be marked as deleted
  • The courses venue and location must be LIKE a search string
  • Courses with no bookings must appear in the results

This is the table structure: (Ive omitted the unneeded information. All fields are not null and have no default)

mysql> DESCRIBE first_aid_courses;

+------------------+--------------+-----+----------------+
| Field            | Type         | Key | Extra          |
+------------------+--------------+-----+----------------+
| id               | int(11)      | PRI | auto_increment |
| course_date      | date         |     |                |
| region_id        | int(11)      |     |                |
| location         | varchar(255) |     |                |
| venue            | varchar(255) |     |                |
| number_of_spaces | int(11)      |     |                |
| deleted          | tinyint(1)   |     |                |
+------------------+--------------+-----+----------------+

mysql> DESCRIBE first_aid_bookings;

+-----------------------+--------------+-----+----------------+
| Field                 | Type         | Key | Extra          |
+-----------------------+--------------+-----+----------------+
| id                    | int(11)      | PRI | auto_increment |
| first_aid_course_id   | int(11)      |     |                |
| placed                | datetime     |     |                |
| confirmed             | tinyint(1)   |     |                |
| cancelled             | tinyint(1)   |     |                |
+-----------------------+--------------+-----+----------------+

mysql> DESCRIBE first_aid_attendees;

+----------------------+--------------+-----+----------------+
| Field                | Type         | Key | Extra          |
+----------------------+--------------+-----+----------------+
| id                   | int(11)      | PRI | auto_increment |
| first_aid_booking_id | int(11)      |     |                |
+----------------------+--------------+-----+----------------+

mysql> DESCRIBE regions;

+----------+--------------+-----+----------------+
| Field    | Type         | Key | Extra          |
+----------+--------------+-----+----------------+
| id       | int(11)      | PRI | auto_increment |
| name     | varchar(255) |     |                |
+----------+--------------+-----+----------------+

I need to select the following:

Course ID:        first_aid_courses.id
Date:             first_aid_courses.course_date
Region            regions.name
Location:         first_aid_courses.location
Bookings:         COUNT(first_aid_bookings)
Attendees:        COUNT(first_aid_attendees)
Spaces Remaining: COUNT(first_aid_bookings) - COUNT(first_aid_attendees)

This is what I have so far:

SELECT `first_aid_courses`.*,
       COUNT(`first_aid_bookings`.`id`)  AS `bookings`,
       COUNT(`first_aid_attendees`.`id`) AS `attendees`
FROM   `first_aid_courses`
       LEFT JOIN `first_aid_bookings`
         ON `first_aid_courses`.`id` =
            `first_aid_bookings`.`first_aid_course_id`
       LEFT JOIN `first_aid_attendees`
         ON `first_aid_bookings`.`id` =
            `first_aid_attendees`.`first_aid_booking_id`
WHERE  ( `first_aid_courses`.`location` LIKE '%$search_string%'
          OR `first_aid_courses`.`venue` LIKE '%$search_string%' )
       AND `first_aid_courses`.`deleted` = 0
       AND ( `first_aid_bookings`.`placed` > '$five_minutes_ago'
             AND `first_aid_bookings`.`cancelled` = 0
              OR `first_aid_bookings`.`confirmed` = 1 )
GROUP  BY `first_aid_courses`.`id`
ORDER  BY `course_date` DESC  

Its not quite working, can any one help me with writing the correct query? Also there are 1000s of rows in this database, so any help on making it fast is appreciated (like which fields to index).

Thanks in advanced

A: 

This is completely untested, but maybe try selecting a count of non-null rows for bookings and attendees, like this:

SUM(IF(`first_aid_bookings`.`id` IS NOT NULL, 1, 0))  AS `bookings`,
COUNT(IF(`first_aid_attendees`.`id` IS NOT NULL, 1, 0)) AS `attendees`
Ben Lee
I couldn't get that to work. It returns the same as the my query.
Petah
A: 

Unless you have it but just do not show it, have a good look on indexes, without them you loose an order of magnitude on performance on any query that references anything but primary key.

Another major performance hit are the LIKE '%nnn%'.

Would it be possible to do something with those?

But with some good indexes, this query should be fine if you have the hardware to back it up.

I have queries doing LIKE on tables with millions of rows. its not a problem if the rest of the query can eliminate any unnecessary matchings.

You could go for subqueries to lessen the scope for the LIKE queries.

David Mårtensson
Yes I have indexes on a lot of the fields (including the 2 fields used in the `LIKE` statement), I'm not 100% sure they are on the right fields though. Also the `LIKE` statement is omitted unless the user types in a search query.
Petah
look up the EXPLAIN command to have the sql engine itself analyze the query -- it will tell you (in a somewhat cryptic manner) whether there are any missing indexes
Ben Lee
A: 

Ok, Ive answered my own question. Sometimes it helps to ask a question for you to figure out the answer.

SELECT `first_aid_courses`.*,
       `regions`.`name`                          AS `region_name`,
       COUNT(DISTINCT `first_aid_bookings`.`id`) AS `bookings`,
       COUNT(`first_aid_attendees`.`id`)         AS `attendees`
FROM   `first_aid_courses`
       JOIN `regions`
         ON `first_aid_courses`.`region_id` = `regions`.`id`
       LEFT JOIN `first_aid_bookings`
         ON `first_aid_courses`.`id` =
            `first_aid_bookings`.`first_aid_course_id`
       LEFT JOIN `first_aid_attendees`
         ON `first_aid_bookings`.`id` =
            `first_aid_attendees`.`first_aid_booking_id`
WHERE  ( `first_aid_courses`.`location` LIKE '%$search_string%'
          OR `first_aid_courses`.`venue` LIKE '%$search_string%' )
       AND `first_aid_courses`.`deleted` = 0
       AND ( `first_aid_bookings`.`cancelled` = 0
             AND `first_aid_bookings`.`confirmed` = 1 )
GROUP  BY `first_aid_courses`.`id`
ORDER  BY `course_date` ASC  
Petah