views:

127

answers:

1

Does an SQL Server "join" preserve any kind of row order consistently (i.e. that of the left table or that of the right table)?

Psuedocode:

create table #p (personid bigint);
foreach (id in personid_list)
    insert into #p (personid) values (id)
select id from users inner join #p on users.personid = #p.id

Suppose I have a list of IDs that correspond to person entries. Each of those IDs may correspond to zero or more user accounts (since each person can have multiple accounts).

To quickly select columns from the users table, I populate a temp table with person ids, then inner join it with the users table.

I'm looking for an efficient way to ensure that the order of the results in the join matches the order of the ids as they were inserted into the temp table, so that the user list that's returned is in the same order as the person list as it was entered.

I've considered the following alternatives:

  1. using "#p inner join users", in case the left table's order is preserved
  2. using "#p left join users where id is not null", in case a left join preserves order and the inner join doesn't
  3. using "create table (rownum int, personid bigint)", inserting an incrementing row number as the temp table is populated, so the results can be ordered by rownum in the join
  4. using an SQL Server equivalent of the "order by order of [tablename]" clause available in DB2

I'm currently using option 3, and it works... but I hate the idea of using an order by clause for something that's already ordered. I just don't know if the temp table preserves the order in which the rows were inserted or how the join operates and what order the results come out in.

EDIT:

Assuming I go with option 3, so there IS a field to order on... is there any form of the join that will help SQL Server to do the least amount of work in maintaining the order. I mean, is it smart enough, for example, to look at what table's fields are in the order by clause and work off that table first while doing the join, so that the result set's order roughly or completely coincides with that table's order, just in case it's already in the desired order?

A: 

SQL sets are never ordered unless you explicitly order them with an order by clause.

Do this:

create table #p (personid bigint);

insert into #p (personid) values (id)
select id from users 
ORDER BY <something like users.name>;

select * from #p 
ORDER BY <something like users.name>;

Note that while you can insert in order, that doesn't mean the subsequent select will be ordered, because SQL sets are never ordered unless you explicitly order them with an order by clause.

You write:

To quickly select columns from the users table, I populate a temp table with person ids, then inner join it with the users table.

Note that in most cases, it'll be faster to just select directly from users, using an in list:

select * form users where users.id in (1, 2, 3, 6, 9, ... );

You're probably prematurely "optimizing" something that doesn't need optimizing. RDBMSes are (usually) written to be efficient, and will probably do little extra work sorting something that's already sorted by chance. Concentrate on functionality until you have a demonstrated need to optimize. (I say this as someone who has been spending the last several months almost solely optimizing SQL on very large (~ half billion row OLTP) datasets, because most of the time, that's true.)

tpdi
There are thousands of IDs involved in this operation (involving search results), so I cannot concatenate a select string of the form you mentioned. Also, there's no such thing as premature optimization. You're either making the code run faster or you're not, and you're either too lazy to improve it or you're smart enough to do it. I'm fine with populating the temp table with a reusable parameterized insert statement that can be cached and reused.
Triynko
The other thing is... the IDs have AN ORDER when inserted (ordered by person's name components), but such an order has nothing to do with the numeric IDs and therefore they're not in an order that's obvious to SQL Server, so it cannot order them with "little extra work" (it would have to join with the people table again and re-sort on name components). I still think my third option with using a rownum column and ordering on it is the best way to preserve a seemingly arbitrary order (at least from SQL Server's perspective).
Triynko
Another thing... the decision to use a loop to populate the temp table, rather than one huge query string is more for correctness than performance (even though the simple cached query performs best and streams the inserts one by one), because with a single string there's a risk that with a large enough set, some limit will be reached for the command text or batch (e.g. 65536 * packet size (4096bytes)?), whereas there's no practical limit to the number of times I can issue an insert statement on a temp table before deleting it.
Triynko

related questions