views:

131

answers:

2

I have a table structure like the following:

user
  id
  name

profile_stat
  id
  name

profile_stat_value
  id
  name

user_profile
  user_id
  profile_stat_id
  profile_stat_value_id

My question is:

How do I evaluate a query where I want to find all users with profile_stat_id and profile_stat_value_id for many stats?

I've tried doing an inner self join, but that quickly gets crazy when searching for many stats. I've also tried doing a count on the actual user_profile table, and that's much better, but still slow.

Is there some magic I'm missing? I have about 10 million rows in the user_profile table and want the query to take no longer than a few seconds. Is that possible?

A: 

Typically databases are able to handle 10 million records in a decent manner. I have mostly used oracle in our professional environment with large amounts of data (about 30-40 million rows also) and even doing join queries on the tables has never taken more than a second or two to run.

On IMPORTANT lessson I realized whenever query performance was bad was to see if the indexes are defined properly on the join fields. E.g. Here having index on profile_stat_id and profile_stat_value_id (user_id I am assuming is the primary key) should have indexes defined. This will definitely give you a good performance increaser if you have not done that. After defining the indexes do run the query once or twice to give DB a chance to calculate the index tree and query plan before verifying the gain

Fazal
A: 

Superficially, you seem to be asking for this, which includes no self-joins:

SELECT u.name, u.id, s.name, s.id, v.name, v.id
  FROM User_Profile       AS p
  JOIN User               AS u ON u.id = p.user_id
  JOIN Profile_Stat       AS s ON s.id = p.profile_stat_id
  JOIN Profile_Stat_Value AS v ON v.id = p.profile_stat_value_id

Any of the joins listed can be changed to a LEFT OUTER JOIN if the corresponding table need not have a matching entry. All this does is join the central User_Profile table with each of the other three tables on the appropriate joining column.

Where do you think you need a self-join?

[I have not included anything to filter on 'the many stats'; it is not at all clear to me what that part of the question means.]

Jonathan Leffler