ansaurus

Question

Answer 1

A:

I think it's more of a business question with parameter of how often does the client run the import, how long would it take for you to implement either of the solution, and how expensive is your time per hour.

If it's something that runs once in a while, a bit of bad performance is acceptable in my opinion, especially if you can get the job done quick using clean code.

eed3si9n 2009-07-20 12:59:52

Answer 2

+1 A:

SELECT store_id FROM store WHERE store_active = 1

or even

SELECT store_id FROM store

will tell you all the active stores in a single query. You can now conduct the other tests on stores you know to exist, and you've saved yourself 1,999 hits to the database.

If you've got relatively uncontested database access, and no time constraint on how long the whole thing is going to take then you've no real need to worry about hitting the connection pool over and over again. That's what it's designed for, after all!

banjollity 2009-07-20 13:00:25

for doing what you are saying, it would make sense to do it if I were to cache that result set somewhere. I have to validate one store at a time, even with the above mentioned approach, and not caching the result set, I dont see how I will save myself 1999 hits to the database.

Jay 2009-07-20 14:14:32

Run the query and store the results in a List<Integer>. That's you first test; does the size of the list match the expected number of stores. Then you can iterate the list running the other tests on the stores using each Integer as the store id for any JDBC queries.

banjollity 2009-07-20 15:31:16

You can use the query above inside the IN (select ...) as a subquery. No caching used at all.

txwikinger 2009-07-20 20:02:13

Answer 3

+2 A:

That way I would obtain get my 'Store' object from the database. When I don't get anything from the database, then that store doesn't exist. So, for such a simple check, I would have to hit the database 2000 times for 2000 stores.

This is what you should not do.

Create a temporary table, fill the table with your values and JOIN this table, like this:

SELECT  STORE_ATTRIB1, STORE_ATTRIB2...
FROM    temptable tt
JOIN    STORE s
ON      s.STORE_ID = t.id

or this:

SELECT  STORE_ATTRIB1, STORE_ATTRIB2...
FROM    STORE s
WHERE   s.STORE_ID IN
        (
        SELECT  id
        FROM    temptable tt
        )

I know using IN is not a suggested methodology. So, what do you think I should be doing? Should I go ahead and use IN here, coz it gives better performance in this scenario? Or should I change my design?

IN filters duplicates out.

If you want each eligible row to be selected for each duplicate value in the list, use JOIN.

IN is in no way a "not suggested methology".

In fact, there was a time when some databases did not support IN queries effciently, that's why folk wisdom still advices against using it.

But if your store_id is indexed properly (and it most probably is, if it's a PRIMARY KEY which it looks like), then all modern versions of major databases (that is Oracle, SQL Server, MySQL and PostgreSQL) will use an efficient plan to perform this query.

See this article in my blog for performance details in SQL Server:

IN vs. JOIN vs. EXISTS

Note, that in a properly designed database, validation rules are also set-based.

I. e. you implement your validation rules as queries against the temptable.

However, to support legacy rules, you can select values from temptable row-by-agonizing-row, apply the rules, and delete values which did not pass validation.

Quassnoi 2009-07-20 13:08:47

ansaurus

tags:

views:

answers:

When to 'IN' and when not to?

related questions