views:

443

answers:

2

I have three tables to define users:

USER: user_id (int), username (varchar)
USER_METADATA_FIELD: user_metadata_field_id (int), field_name (varchar)
USER_METADATA: user_metadata_field_id (int), user_id (int), field_value (varchar)

I'd like to create a middle tier user that has certain access to other users within the application. To determine which users the logged in use can access, I am using a subquery like the following:

SELECT user_id FROM user WHERE user_id 
     IN (SELECT user_id 
         FROM user_metadata 
         WHERE user_metadata_field_id = 1 AND field_value = 'foo')

Currently I am storing the subquery string in a variable and then dynamically inserting it into the outer query each time I need to pull a list of users. After doing this I thought, "it has got to be better to just store a string of the actual user_ids".

So instead of storing this in a variable...

$subSql = "SELECT user_id FROM user_metadata WHERE user_metadata_field_id = 1 AND field_value = 'foo'";

... I actually perform the query and store the result like this...

$subSql = "12, 56, 89, 100, 1234, 890";

Then when I need to pull a lit of users that the logged in user has access to, I can do so with:

$sql = "SELECT user_id FROM user WHERE user_id IN ($subSql)";

And finally the questions:

How many items can you use in a MySQL "IN CLAUSE"? Storing the actual ids instead of the sub-sql statement has got to be faster for performing that outer query each time, right?

+1  A: 

Starting from a certain number, the IN tables are faster.

MySQL has something inside its code that makes building a range over a large number of constant values slower than doing the same in a nested loop.

See this article in my blog for performance details:

Quassnoi
+1  A: 

As hinted in Quassnoi's response, one stumbles upon other practical considerations, before hitting any possible limit imposed by a given MySql version's implementation (*). Therefore, as the number of admin users (or other criteria which may require an IN construct) grows, one should seek to use alternatives to a literal "IN", such as the use of temporary (or even permanent) tables.

Since you are considering special handling of the "admin user" criteria, for performance purposes, I'd like to offer an comment and a suggestion.

Comment: Could this be a case of premature optimization?
I'm unaware of the specifics of this database, its volume, complexity etc. And, yes, I am aware of some the performance tribute to be paid to the EAV (Entity-Attribute-Value) format, but I'm thinking that even for successful businesses, the accounts database rarely counts in excess of 10,000 users. So even with very many attributes per user we're still looking at a relatively small EAV table, which may not require this type of optimization. (On the other hand a few other optimization tricks may be welcome in other areas).
Furthermore, typical use cases, involve a relative few inquiries into the account database, relative to other queries, and this is therefore another reason to deffer any non trivial performance consideration for the accounts-related features of the application.

Suggestion: Maybe use "re-normalized attributes"
For attributes that are singled-valued, and in particular if they are short, they can be moved (or duplicated) in the Entity table ('USER' table in this case). This introduces a bit of logic at the time items are inserted or updated, but this sames many joins (or subqueries) and also provides opportunities to consider multi-field indexes to support the most common use cases.

(*) Is there a limt?
I haven't read about any such a limit; I know Oracle has (had) a 1,000 limit at some time, MSSQL doesn't; of course all servers do have a limit based on the overall length of the SQL statement, but this is a really big number! if one ever stumble upon that one, he/she has other problems... ;-)

mjv