tags:

views:

202

answers:

9

I have a MS SQL DB with about 2,600 records (each one information on a computer.) I need to write a SELECT statement that selects about 400 of those records.

What's the best way to do that when they don't have any common criteria? They're all just different random numbers so I can't use wildcards or anything like that. Will I just have to manually include all 400 numbers in the query?

A: 

You could create an XML list or something of the sort which would keep track of what you need to query, and then you could write a query that would iterate through that list bringing all of them back.

Here is a website that has numerous examples of performing what you are looking for in a number of different methods (#4 is the XML method).

TheTXI
A: 

For this specific situation (not necessarily a general solution) the fastest and simplest thing is probably to read the entire SQL table into memory and find your matches in your program's code rather than have the database parse a gigantic where clause.

Mike
+6  A: 

If you need 400 specific rows where their column match a certain number:

Yes include all 400 numbers using an IN clause. It's been my experience (via code profiling) that using an IN clause is faster than using where column = A or column = B or ...

400 is really not a lot.

SELECT * FROM table WHERE column in (12, 13, 93, 4, ... )

If you need 400 random rows:

SELECT TOP 400 * FROM table
ORDER BY NEWID()
Brian R. Bondy
I think he knows what one he wants to select (not just a random sampling). He is just saying they are "random" because there is no real rhyme or reason as to why they are selected while others are not.
TheTXI
I included both ways because I'm not sure which of the 2 the poster meant.
Brian R. Bondy
can you show us proof that IN is faster than ORs?
I do not have proof, but from profiling both alternatives in my app I found that it is faster. Profile your own code to be sure, but I'm just talking from my own experiences here.
Brian R. Bondy
Then maybe making a declarative statement is a bad thing if you have no proof. Maybe you should say, "It's been my experience that..."
@Mark Brady: Thanks for the suggestion, I modified my post to be more clear on where my answer came from.
Brian R. Bondy
+2  A: 

Maybe you have found a deficiency in the database design. That is, there is something common amongst the 400 records you want and what you need is another column in the database to indicate this commonality. You could then select against this new column.

iamdudley
while this may be true. It's not an answer to the question. Otherwise +1
Maybe, but I think the precision of my answer was inline with the precision of the question ;-)
iamdudley
A: 

You can create a table with those 400+ random tokens, and select on those. e.g.,

SELECT * FROM inventory WHERE inventory_id IN (SELECT id FROM inventory_ids WHERE tag = 'foo')

You still have to maintain the other table, but at least you're not having one ginormous query.

Chris Jester-Young
And yes, you can write that statement as a JOIN, but personally I find this way easier to read.
Chris Jester-Young
Easier to read and much slower if the optimizer can't rewrite it.
A: 

I would built a separate table with your selection criteria and then join the tables together or something like that, assuming your criteria is static of course.

GordyII
+4  A: 

Rather than executing multiple queries or selecting the entire rowset and filtering it yourself, create either a temporary table or or a permanent table where you an insert temporary rows for each ID. In your main query just join on your temporary table.

For example, if your source table is...

person:
    person_id
    name

And you have 400 different person_id's you want, let's say we have a permanent table for our temporary rows, like this...

person_query:
   query_id
   person_id

You'd insert your rows into person_query, then execute your query like this..

select
    *

from person p

join person_query pq on pq.person_id = p.person_id

where pq.query_id = @query_id
Adam Robinson
This still means I have to manually insert those 400 rows one-by-one though, right? But I guess that's unavoidable.
The KZA
Yes, you do, but depending on the complexity of the query it's likely going to be faster than any other method. Any way to get 400 items into an 'in' clause would have to involve dynamically constructed SQL, which is another danger.
Adam Robinson
+1  A: 

As Brian Bondy said above, using the IN statement is probably the best way

SELECT * FROM table WHERE column in (12, 13, 93, 4, ... )
One good trick is to paste the IDs in from a spreadsheet, if you have one ...

If the IDs of the rows you want are in a spreadsheet, then you can add an extra column to the spreadsheet that CONCATENATES() a comma on to the end of the ID, so that the column in your spreadsheet looks like this:
12,
13,
93,
4,

then copy and paste this column of data into your query, so it looks like this:

SELECT * FROM table WHERE column in (
12, 
13, 
93, 
4, 
... 
)

It doesn't look pretty but its a quick way of getting all the numbers in.

codeulike
A: 

Just select the TOP n rows, and order by something random.

Below is a hypothetical example to return 10 random employee names:

   SELECT TOP 10
     EMP.FIRST_NAME
    ,EMP.LAST_NAME
   FROM  
     Schema.dbo.Employees EMP
   ORDER BY
     NEWID()
JosephStyons