views:

60

answers:

4

I am using MS SQL server 2005

I have a table with 3 columns where I store user-message mapping like:

msg_for msg_from msg_id 
bob     bob      1 
bob     john     1 
bob     steve    1 
bob     bob      2 
bob     john     2 
bob     bob      3 
bob     john     3 
bob     steve    3

The PK is on 3 columns and msg_id is FK to messages table that stores the messages

The above is the physical storage I see according to the PK on 3 columns

Now my query MUST return the messages for a given user having latest msg at top (order by msg_id DESC)

bob john  3
bob steve 3
bob john  2
bob steve 2
bob john  1
bob steve 1

This mapping table has millions of rows. I see 95% of the cost is to SORT the result.

Is it possible to have the PK or some other way store data physically like this (avoid SORT)?

msg_for msg_from msg_id
bob     bob      3
bob     john     3
bob     steve    3
bob     bob      2
bob     john     2
bob     bob      1
bob     john     1
bob     steve    1

Thanks

A: 

Depending on your DBMS, you can use a clustered index to achieve that.

Frank
As I said I have a PK on 3 columns. So I have the clusted index already
Projapati
But the clusted index stores rows in msg_id in 1,2,3..not 3,2,1
Projapati
When creating the index, you can in most DBMS add a DESC after a column to note descending instead of ascending order, and you can state the column order as well.
Frank
+1  A: 

Yes.

When you set up the Primary Key (or any index) you can define this

ALTER TABLE dbo.[Messages] ADD CONSTRAINT [PK_Messages] PRIMARY KEY CLUSTERED 
(
    msg_for ASC, msg_from ASC, msg_id DESC
)

SQL Server can scan in either direction so it only makes sense if you want to control the sort order combination for multiple columns.

Edit: You say in the comments that the problem query is

select top 10 msg_id 
from message_user 
where msg_for = @user_name 
and msg_from <> @user_name 
order by msg_id DESC

The issue here isn't one of Ascending, Descending.

To give an analogy. Phone books are listed in surname, forename order but if you needed to know the lexicographically last 10 forenames in the directory you would need to scan the whole book. This would be unavoidable regardless of whether or not within each section forenames were listed in ascending or descending order.

Similarly the composite index keys would need to be msg_for, msg_id, msg_from to satisfy this query optimally not msg_for, msg_from, msg_id With this latter order it will still need to scan the whole section of the index satisfying the msg_for = @user_name criteria as it cannot know if there will be a later msg_id still to come belonging to a later msg_from Additionally regardless in which direction msg_id is sorted in their individual sub sections a sequential scan of the msg_for = @user_name part of the index will still require a sort as it they are fragmented by being in subsections according to msg_from.

Martin Smith
Are there any cons of this alteration, if any?
Projapati
If you are going to alter the clustered PK on a large table you will need a maintenance window to do this in. You'd have to consider adverse effects on other queries that rely on the existing order. Can you post the actual query with this 'sort' operator?
Martin Smith
This is the query that causes 95% cost on sorting:select top 10 msg_id from message_user where msg_for = @user_name and msg_from <> @user_nameorder by msg_id DESCOther queries:All of my queries on this table MUST return rows with msg_id in DESC order
Projapati
@Projapati - The issue here isn't one of Ascending, Descending. It is that the composite index keys would need to be `msg_for, msg_id, msg_from` to satisfy this query optimally not `msg_for, msg_from, msg_id`
Martin Smith
Not sure if msg_for, msg_id, msg_from order of composite key will make any difference. It will still sort.I will give it a try and post here within 1 hour. I am going outside for an hour.
Projapati
Even so, the fact that the key columns are of varchar type has a large impact on performance. Even if the index would have the columns in the right order it's very resource intensive for the DBMS engine to keep it sorted with every table change because it needs to sort strings.
CyberDude
@projapati - If this is your PK then you would also need to consider that these values will be repeated in any non clustered indexes that you may have on your table. Using numeric columns would keep the overhead of that down as well.
Martin Smith
I have another query that returns msg_id on this criteria:where msg_for = @user_name and msg_from = @user_name order by msg_id DESCNote that the 2nd condition is now msg_from=msg_for instead of <>
Projapati
@Projapati Well your existing index is perfect for that other query. Assuming it's a simple query you likely shouldn't see any sorts on that even though it is ordering by `msg_id DESC` it should just scan the relevant part of the index in reverse.
Martin Smith
Last question Martin, I am seeing 2 PK index choices from your responses: msg_for ASC, msg_from ASC, msg_id DESCORmsg_for ASC, msg_id ASC, msg_from ASCNot right? Please clarify. Thank you
Projapati
+2  A: 

The only way to guarantee order in the result set is by using an ORDER BY.

In SQL Server, a clustered index can help... assuming the optimizer sees the index as being useful.

OMG Ponies
+1  A: 

Well no wonder sorting takes forever. Varchar/string types are usually types that are very heavy when it comes to sorting, whether it's SQL or any programming language for that matter. Whenever possible use integral types for such things.

I suggest you use integral values to identify members. Have a Members table (MemberId INT, MemberName VARCHAR, etc), then a Messages table (MessageId INT, MessageBody VARCHAR, etc) and then have a join table, say Correspondence with (SenderMemberId INT, RecipientMemberId INT, MessageId INT). Sorting on integral values will be way faster this way.

I think you can easily refactor your data to fit on such a new structure.

CyberDude
That will force me to join another table to see who is the user for that given id. I want to avoid join.
Projapati
Joins are not really something to run away from. Of course having too many of them can have downsides but in the majority of cases they help a lot. Storing names as keys is an awful waste of storage space, not to mention redundancy (what if a member wishes to change his name?)Do a benchmark, see how the performance of the two methods compare.
CyberDude