tags:

views:

368

answers:

3

I'm joining 3 tables. the last one contains several photos per item. The idea is to get one record per item with only one photo selected at random. The ORDER varies, the WHERE varies. The best I could get is a subquery. There is a more effective way of doing this?

SELECT *
FROM (
    SELECT *
    FROM art
    LEFT JOIN prov USING ( provID )
    LEFT JOIN photos USING ( artID )
    WHERE artBrand = "Shimano" AND catID=5 AND (stock1>0 OR stock2>0) AND itemName LIKE "%20mm%" 
            //the WHERE changes wildly, this is only an example
   ORDER BY rand( ) // sometimes is rand, sometimes price, sometimes stock...
)rand
GROUP BY artID // this is the only reasonable way i've got so far to avoid duplicates when an article has more than 1 photo

LIMIT x //it varies, sometimes i need to show all articles that meet the search query, sometimes I only need a few (5 or 10)

Important: this query WORKS. But I need a more effective way of doing this. Subqueries are very expensive.

Update:

  • I need a different photo everytime
  • I'm lookin for a fastest query (this is the point of my question).
  • The sorting of the products varies, sometimes I need to use the price, sometimes the name, sometimes random...
  • Some articles doesn't have a photo, that's why I'm using a left join. I can't let them out.
  • Some articles have 10, 15 or even 20 photos.
  • But on some occasions i need to show only articled with a photo
  • The WHERE can have up to 20 conditions and get very complex. This is only a simplification.

Simplified Table structures:

Table art
artID: int
artName: varchar
artBrand: varchar
artPrice: decimal
provID: int
photo: int (when there is at least one available photo this is set to 1)
many more rows

Table prov
provID: int
provName: varchar
many more rows

Table photos
artID: int
thumb: varchar
normal: varchar
big: varchar
nothing fancy
A: 

OK, since you have added more explanation, let me rewrite my answer. First of all you need a primary key for photos table, following your naming rules I name it photoID

SELECT 
    *,
    (SELECT photoID FROM photos 
        WHERE artID=art.artID
        ORDER BY rand() LIMIT 1
    ) AS randPhoto, /*single random image of this item*/
    (SELECT thumb FROM photos WHERE photoID=randPhoto) AS thumb,
    (SELECT someotherfield FROM photos  WHERE photoID=randPhoto) AS otherfield,
FROM art
   LEFT JOIN prov USING ( provID )
WHERE artID >20
        AND artID <40

No group by is required

Cem Kalyoncu
But it's missing the part of get a random image
The Disintegrator
Ok, since you have added more explanation I have rewritten my answer, please check again.
Cem Kalyoncu
Does that use of rand() definately work? I know it doesn't in [MSSQL Server], Every record gets the same value, Rand() is executed once then substituted, not once for every record. And I'm sure I've seen questions about "choosing a record at random" that say it works the sme for MySQL too... (But I'm no MySQL expert, so I thought I'd ask)
Dems
it works. but I can't add condition to the where involving the photos table... And can't understand why
The Disintegrator
Can you post the error mysql gave?
Cem Kalyoncu
unknown column, like if the photos table wasn't part of the query
The Disintegrator
Its not the part of query if you try to use fields outside the sub select. You must use another sub query if you want to add it to general conditions.
Cem Kalyoncu
And additional subquery would make the query even less effective...
The Disintegrator
+1  A: 

Have you considered returning all photos in your query and picking the photo randomly in your application layer? If you only have a few photos for each object it might not be as expensive as you think. It might be worth a try.

Doug R
Some items have up to 20 photos. And I need do to things like "show the 5 best seller articles", so I have to use a LIMIT clause or I will get ALL the articles in the database multiplied by the photos
The Disintegrator
+2  A: 

Given the conditions you've laid out, I think you've got a pretty good solution. That said, the only other option that I can see it to split your query into two pieces. First, get the item and then run a second query to grab the pictures. And then finally at the application layer, choose a random photo.

By splitting up your queries they both become simpler, which leads to more maintainable code. And it might even be faster, since the first query will act to limit the number of photos your second query is even looking at--depending on your indexing scheme.

Cynthia
yeah, i'm testing doing this in two steps and seems to be a better solution (and memcached use it better). If nobody show up with a better idea, the bounty is yours.
The Disintegrator