ansaurus

Question

SQL sorting , paging, filtering best practices in ASP.NET

Answer 1

+3 A:

Search engines like Google use very complex behind-the-scenes algorythyms to index searches. Essentially, they have already determined which words occur on each page as well as the relative importance of those words and the relative importance of the pages (relative to other pages). These indexes are very quick because they are based on Bitwise Indexing.

Consider the following google searches:

custom : 542 million google hits
pager : 10.8 m
custom pager 1.26 m

Essentially what they have done is created a record for the word custom and in that record they have placed a 1 for every page that contains it and a 0 for every page that doesn't contain it. Then they zip it up because there are a lot more 0s than 1s. They do the same for pager.

When the search custom pager comes in, they unzip both records, perform a bitwise AND on them and this results in an array of bits where length is the total number of pages that they have indexed and the number of 1s represents the hit count for the search. The position of each bit corresponds to a particular result which is known in advance and they only have to look up the full details of the first 10 to display on the first page.

This is oversimplified, but that is the general principle.

Oh yes, they also have huge banks of servers performing the indexing and huge banks of servers responding to search requests. HUGE banks of servers!

This makes them a lot quicker than anything that could be done in a relational database.

Now, to your question: Could you paste some sample SQL for us to look at?

One thing you could try is changing the order that the tables and joins appear in your SQl statement. I know that it seems that it shouldn't make a difference but it certainly can. If you put the most restrictive joins earlier in the statement then you could well end up with fewer overall joins performed within the database.

A real world example. Say you wanted to find all of the entries in the phonebook under the name 'Johnson', with the number beginning with '7'. One way would be to look for all the numbers beginning with 7 and then join that with the numbers belonging to people called 'Johnson'. In fact it would be far quicker to perform the filtering the other way around even if you had indexing on both names and numbers. This is because the name 'Johnson' is more restrictive than the number 7.

So order does count, and datbase software is not always good at determining in advance which joins to perform first. I'm not sure about MySQL as my experience is mostly with SQL Server which uses index statistics to calculate which order to perform joins. These stats get out of date after a number of inserts, updates and deletes, so they have to be re-computed periodically. If MySQL has something similar, you could try this.

UPDATE I have looked at the query that you posted. Ten left joins is not unusual and should perform fine as long as you have the right indexes in place. Yours is not a complicated query.

What you need to do is break this query down to its fundamentals. Comment out the lookup joins such as those to currency, course_stats, countries, states and cities along with the corresponding fields in the select statement. Does it still run as slowly? Probably not. But it is probably still not ideal.

So comment out all of the rest until you just have the courses and the group by course id and order by courseid. Then, experiment with adding in the left joins to see which one has the greatest impact. Then, focusing on the ones with the greatest impact on performance, change the order of the queries. This is the trial - and - error approach,. It would be a lot better for you to take a look at the indexes on the columns that you are joining on.

For example, the line cm.method_id = c.method_id would require a primary key on course_methodologies.method_id and a foreign key index on courses.method_id and so on. Also, all of the fields in the where, group by and order by clauses need indexes.

Good luck

UPDATE 2 You seriously need to look at the date filtering on this query. What are you trying to do?

   AND ((('2010-09-01 00:00:00' <= esched.date_start
          AND esched.date_start <= '2010-09-25 00:00:00')
         OR ('2010-09-01 00:00:00' <= esched.date_end
             AND esched.date_end <= '2010-09-25 00:00:00'))
        OR ((esched.date_start <= '2010-09-01 00:00:00'
             AND '2010-09-01 00:00:00' <= esched.date_end)
            OR (esched.date_start <= '2010-09-25 00:00:00'
                AND '2010-09-25 00:00:00' <= esched.date_end)))

Can be re-written as:

AND (

    //date_start is between range - fine
    (esched.date_start BETWEEN '2010-09-01 00:00:00' AND '2010-09-25 00:00:00') 

    //date_end is between range - fine
    OR (esched.date_end BETWEEN '2010-09-01 00:00:00' AND '2010-09-25 00:00:00')       

    OR (esched.date_start <= '2010-09-01 00:00:00' AND esched.date_end >= '2010-09-01 00:00:00' ) 

    OR (esched.date_start <= '2010-09-25 00:00:00' AND esched.date_end > = '2010-09-25 00:00:00')
  )

Daniel Dyson 2010-09-13 14:26:03

I am definitely using indexes, will get back here later.

geocine 2010-09-13 16:01:55

I am a trying to filter an event date that is within a date range. eg. The Event is Sept 3 to Sept 5 . A query Sept 3 to 4 , Sept 4 to 6 and Sept 1 to 3 should return the event.

geocine 2010-09-14 08:30:53

How about Sept 1 to 8 query it should return the event as well as September 4 and September 2

geocine 2010-09-14 09:12:13

Yes, very good point. I have rearranged the 3rd and 4th OR statements to make them clearer. How are you getting on with the indexing?

Daniel Dyson 2010-09-14 10:43:28

How many courses are there? The check c.course_title LIKE '%cook%' will take a long time because of the leading %. This is because an index on course_title will have to perform a table scan for any record containing 'cook'. Does this field have a full text index? Incidentally, a search such as c.course_title LIKE 'cook%' would be quick because the index would have the records in alphabetical order so it would just look for courses starting with 'cook'. As soon as you add a leading %, the index becomes useless. I know this is not much help, but it is something to be aware of.

Daniel Dyson 2010-09-14 15:11:39

what if I want to retain the same logic. will full text index work?

geocine 2010-09-14 17:19:51

@Daniel that might be right on the money. I don't know in mysql, but in ms sql that could easily be the main cause of the issue. Specifically the db engine could decide to use an execution plan that doesn't use the dates indexes because it already has to do a table scan / depends on the metrics, but I've seen it happen. Some ways I've worked around it is force the use of specific indexes or rewrite the query so part of it is explicitly only hitting indexes.

eglasius 2010-09-14 17:46:45

I have updated my post.

geocine 2010-09-16 17:16:56

Answer 2

+2 A:

on your update you mention you suspect the problem to be in the date filters.

All those date checks can be summed up in a single check:

esched.date_ends >= '2010-09-01 00:00:00' and esched.date_start <= '2010-09-25 00:00:00'

If with the above it behaves the same, check if the following returns quickly / is picking your indexes:

SELECT COUNT(DISTINCT esched.course_id) FROM events_schedule esched WHERE esched.date_ends >= '2010-09-01 00:00:00' and esched.date_start <= '2010-09-25 00:00:00'

ps I think that when using the join, you can do SELECT COUNT(c.course_id) to count main records of courses in the query directly i.e. might not need the distinct that way.

re update now most time going to the wild card search after the change:

Use a mysql full text search. Make sure to check fulltext-restrictions, one important is that its only supported in MyISAM tables. I must say that I haven't really used the mysql full text search, and I'm not sure how that impacts the use of other indexes in the query.

If you can't use a full text search, imho you are out luck in using your current approach to it i.e. since it can't use the regular index to check if a word its contained in any part of the text.

If that's the case, you might want to switch that specific part of the approach and introduce a tag/keywords based approach. Unlike categories, you can assign multiple to each item, so its flexible yet doesn't have the free text issue.

eglasius 2010-09-14 18:19:53

I updated my post.

geocine 2010-09-16 17:17:25

@geocine if I followed that right, it took the problem off the date filters to the like % / so its an improvement, right?

eglasius 2010-09-16 19:26:05

ansaurus

tags:

views:

answers:

SQL sorting , paging, filtering best practices in ASP.NET

related questions