tags:

views:

105

answers:

4

Hey fellow programmers

Okay, first let me say that this is a hard one. I know the presentation may be a little long. But I how you'll bare with me and help me through anyway :D

So I'm developing on an advanced search for bicycles. I've got a lot of tables I need to join to find all, let's say, red and brown bikes. One bike may come in more then one color! I've made this query for now:

SELECT DISTINCT p.products_id,                    #simple product id
                products_name,                    #product name
                products_attributes_id,           #color id
                pov.products_options_values_name  #color name
FROM   products p
       LEFT JOIN products_description pd
         ON p.products_id = pd.products_id
       INNER JOIN products_attributes pa
         ON pa.products_id = p.products_id
       LEFT JOIN products_options_values pov
         ON pov.products_options_values_id = pa.options_values_id
       LEFT JOIN products_options_search pos
         ON pov.products_options_values_id = pos.products_options_values_id
WHERE  pos.products_options_search_id = 4         #code for red
       OR pos.products_options_search_id = 5      #code for brown

My first concern is the many joins. The Products table mainly holds product id and it's image and the Products Description table holds more descriptive info such as name (and product ID of course).

I then have the Products Options Values table which holds all the colors and their IDs. Products Options Search is containing the color IDs along with a color group ID (products_options_search_id). Red has the color group code 4 (brown is 5).

The products and colors have a many-to-many relationship managed inside Products Attributes.

So my question is first of all: Is it okay to make so many joins? Is i hurting the performance?

Second: If a bike comes in both red and brown, it'll show up twice even though I use SELECT DISTINCT. Think this is because of the INNER JOIN. Is this possible to avoid and do I have to remove the doubles in my PHP code?

Third: Bikes can be double colored (i.e. black and blue). This means that there are two rows for that bike. One where it says the color is black and one where is says its blue. (See second question). But if I replace the OR in the WHERE clause it removes both rows, because none of them fulfill the conditions - only the product. What is the workaround for that?

I really hope you will and can help me. I'm a little desperate right now :D

Regards Mikkel Lund

+2  A: 

This is not a lot of joins. Assuming decent indexes it will not affect anything negatively.

distinct does select the distinct combination of all fields in the select clause. So, yes, if you have more than one color more than one bike will show up since you have included the color field. If you only want one color you should tell it which color you want (like, the one with the maximum code, or something). Or don't select the color code (since you don't appear to care what it actually is anyways). Good rule of thumb: only select the fields that you actually need to.

Your last question is unclear. If you replace the or with what? If you try to make it and no it won't work, because no single row has both color codes (since that's impossible).

Donnie
+2  A: 

Ok, firstly SQL is made to do joins and your query is not very big. If indexed propertly this should actually increase your performance as it is easier to fetch the correct data.

You can remove the duplicates you are getting by using a sub-query as follows :

SELECT DISTINCT p.products_id,                    #simple product id
                products_name,                    #product name
                products_attributes_id,           #color id
                pov.products_options_values_name  #color name
FROM   products p
       LEFT JOIN products_description pd
         ON p.products_id = pd.products_id
WHERE p.products_id in (
       Select products_id from products_attributes pa #This will give you the ID forall bikes that have either red or brown in them
       INNER JOIN products_options_values pov
         ON pov.products_options_values_id = pa.options_values_id
       INNER JOIN products_options_search pos
         ON pov.products_options_values_id = pos.products_options_values_id
       WHERE  pos.products_options_search_id = 4         #code for red
            OR pos.products_options_search_id = 5      #code for brown)

Of course you will then not be able to return the color as part of the results as the color does not have a single value. You could combine the colors into a single field by writing a function if you need to do that.

Your option on the 3rd one is to either roll up the data in your PHP code (return items more than once in your result set, but loop through and only show every item once), and then show the list of colors returned in a different way (as an additional table or a comma separated list or whatever suits you.

If you make the OR an AND it of course means all the bikes that are both red AND brown. This would be correct if this is what you are looking for, but it sounds like you want either and not both.

Cobusve
Love the idea about a subquery! Think I'll keep working with this result!I figured it out with the last question. I would actually never need to find bikes with double colors... :)
lund.mikkel
A: 

As Donnie and Cobusve have pointed out, this isn't a lot of joins. However, in a relational (ie. normalised) schema, product attributes (such as name and colour) would normally be stored on the product table, rather than in separate tables.

Assuming you can't do anything about the table structures, an alternative query might be:

SELECT p.products_id,                          #simple product id
       products_name,                          #product name
       min(products_attributes_id),            #lowest color ID
       max(products_attributes_id),            #highest color ID
       min(pov.products_options_values_name),  #lowest color name
       max(pov.products_options_values_name)   #highest color name
FROM   products p
       LEFT JOIN products_description pd
         ON p.products_id = pd.products_id
       INNER JOIN products_attributes pa
         ON pa.products_id = p.products_id
       INNER JOIN products_options_values pov
         ON pov.products_options_values_id = pa.options_values_id
       INNER JOIN products_options_search pos
         ON (pov.products_options_values_id = pos.products_options_values_id AND
        pos.products_options_search_id IN (4, 5) )         #codes for red, brown
group by p.products_id, products_name

Depending on which dialect of SQL (SQLServer, Oracle, MySQL etc.) you're using it's possible that the syntax for the final condition might need to be slightly different.

Obviously, where only one colour is returned, the lowest and highest values will be the same.

Mark Bannister
It's an old website I'm renewing. So yes, it's hard to change the structure. I have added some columns and tables here and there, but the overall structure isn't something I wanna mess with.I don't quite get what the min/max is about. What is the purpose?Very nice idea with the IN (4,5). So simple yet I haven't thought about it. Definitely gonna use that.The query worked perfectly btw. Nothing wrong there.
lund.mikkel
The idea with the min/max is that when the colours are different, one (brown) will appear in the lowest and one (red) will appear in the highest, so both colours will appear on one line. Obviously, this dodge will only work where there are a maximum of two colours being selected. I would have suggested using a conditional structure - such as decode in Oracle, or case in a number of other SQLs - but these tend to vary between different dialects of SQL, and I don't know which SQL you are using.
Mark Bannister
+1  A: 

Definitely not too many joins. Unless the ON clause is not equality, joins normally restrict the result set nicely, especially when the right indexes are available.

What you want is similar to finding Questions on SO with at least one of a list of tags, so I wrote that query as a comparison: http://odata.stackexchange.com/stackoverflow/s/116/so3005416-comparison-select-questions-with-any-selected-tags

Ignore the code to get the tags into a temporary table, that's really just confirming code from another question's answer. Just compare the two final Select statements.

Mark Hurd
Thanks, good inspiration! Some nice things to think about in the SELECT statements.
lund.mikkel