views:

16

answers:

1
+1  Q: 

mysql optimisation

Hi guys, I have a photo gallery with 1.5 million photos so we are talking a big database. We want to make it so that people once they have uploaded a photo can upload into several different areas of the site. Photos also have attributes and tags and we have a tables for that as well. here is the layout

PHOTO TABLE
photo_id
image
status

PHOTO DATA TABLE
photo_data_id
object_type
object_id
data_id
int_val
float_val
string_val
added_date
status

Tags
tag_id
tag
status

Tag Object
object_id (relation to the photo_data_id which says that the photo is in the gallery)
object_type
status

Generally everything works ok. The photo has a record in photo_data to link it to the gallery. It also has links in there for votes,views and unique views. People can order by those fields and can also browse via a certain tag.

The problem comes when I want to look at a certain tag and order by votes,views etc

I've tried several queries

    SELECT (SELECT int_val FROM photo_data pd WHERE pd.object_type = 4 AND pd.object_id = 0 AND pd.data_id = 5 AND pd.photo_id = photo_data.photo_id) AS views
,photo_data.photo_data_id
FROM photo_data WHERE object_type = 4 AND object_id = 0 AND data_id = 0 AND status = 1 AND
(SELECT COUNT(1) FROM tagsV5_objects WHERE object_id = photo_data.photo_data_id AND object_type = 6 AND tag_id = 2863 AND status = 1) > 0
ORDER BY views DESC LIMIT 50


explain SELECT photo_data.photo_data_id,(SELECT int_val FROM photo_data pd WHERE pd.object_type = 4 AND pd.object_id = 0 AND pd.data_id = 5 AND pd.photo_id = photo_data.photo_id) AS views
FROM tagsV5_objects
INNER JOIN photo_data ON photo_data_id = tagsV5_objects.object_id AND photo_data.data_id = 0 AND photo_data.status = 1 AND photo_data.object_type = 4 AND photo_data.object_id = 0
 WHERE tagsV5_objects.object_type = 6 AND tagsV5_objects.tag_id = 2863 AND tagsV5_objects.status = 1 AND
(SELECT COUNT(1) FROM photo_data WHERE tagsV5_objects.object_id = photo_data.photo_data_id AND data_id = 0 AND object_type = 4 AND object_id = 0)  > 0
ORDER BY views DESC LIMIT 50

     SELECT photo_data.photo_data_id, DATE_FORMAT(photo_data.added_date,'%d/%m/%Y - %l:%i %p') AS added_date_format, photo_data.photo_id, (SELECT varchar_val FROM photo_data pd WHERE photo_data.photo_id = pd.photo_id AND photo_data.object_type = pd.object_type AND photo_data.object_id = pd.object_id AND pd.data_id = 1) AS title, (SELECT user_id
    FROM photos WHERE photos.photos_id = photo_data.photo_id) AS user_id

    FROM photo_data photo_data_order FORCE INDEX(Index_int)

    INNER JOIN photo_data
    ON (
        photo_data_order.object_type = photo_data.object_type 
        AND photo_data_order.object_id = photo_data.object_id 
        AND photo_data.photo_id = photo_data_order.photo_id
        AND photo_data.data_id = 0
        AND photo_data.status = 1
    )

    INNER JOIN tagsV5_objects
    ON (photo_data.photo_data_id = tagsV5_objects.object_id)

    WHERE photo_data_order.object_type = 4 
    AND photo_data_order.object_id = 0 

    AND photo_data_order.data_id = 5 

    AND tagsV5_objects.object_type = 6 
    AND ( tagsV5_objects.tag_id = 2863 ) 

    ORDER BY photo_data_order.int_val DESC,photo_data_order.photo_data_id DESC
    LIMIT 0, 60

But the same problem occurs it's very slow. Seems like I can do it one of two ways the query the views and get those first but the we have trouble getting the tags Get the tags first but then I have trouble ordering. Does anyone have any ideas how to get around this problem.

This is for the one that gets the tags first

"id","select_type","table","type","possible_keys","key","key_len","ref","rows","Extra"
1,"PRIMARY","tagsV5_objects","ref","PRIMARY,Index_object","Index_object","8","const,const",123700,"Using where; Using temporary; Using filesort"
1,"PRIMARY","photo_data","eq_ref","PRIMARY,Index_photos,Index_object_2,Index_int,Index_float,Index_modified,Index_object,Index_added_date","PRIMARY","4","ezinev5.tagsV5_objects.object_id",1,"Using where"
3,"DEPENDENT SUBQUERY","photo_data","eq_ref","PRIMARY,Index_photos,Index_object_2,Index_int,Index_float,Index_modified,Index_object,Index_added_date","PRIMARY","4","ezinev5.tagsV5_objects.object_id",1,"Using where"
2,"DEPENDENT SUBQUERY","pd","ref","Index_photos,Index_object_2,Index_int,Index_float,Index_modified,Index_object,Index_added_date","Index_photos","16","const,const,ezinev5.photo_data.photo_id,const",1,""

Views first

"id","select_type","table","type","possible_keys","key","key_len","ref","rows","Extra"
1,"PRIMARY","photo_data_order","ref","Index_int","Index_int","12","const,const,const",2708082,"Using where"
1,"PRIMARY","photo_data","ref","PRIMARY,Index_photos,Index_object_2,Index_int,Index_float,Index_modified,Index_object,Index_added_date","Index_photos","16","const,const,ezinev5.photo_data_order.photo_id,const",1,"Using where"
1,"PRIMARY","tagsV5_objects","eq_ref","PRIMARY,Index_object","PRIMARY","12","ezinev5.photo_data.photo_data_id,const,const",1,"Using index"
3,"DEPENDENT SUBQUERY","photos","eq_ref","PRIMARY","PRIMARY","3","ezinev5.photo_data.photo_id",1,"Using where"
2,"DEPENDENT SUBQUERY","pd","ref","Index_photos,Index_object_2,Index_int,Index_float,Index_modified,Index_object,Index_added_date","Index_photos","16","ezinev5.photo_data.object_type,ezinev5.photo_data.object_id,ezinev5.photo_data.photo_id,const",1,""

Gallery and then tags

"id","select_type","table","type","possible_keys","key","key_len","ref","rows","Extra"
1,"PRIMARY","photo_data","ref","Index_photos,Index_object_2,Index_int,Index_float,Index_modified,Index_object,Index_added_date","Index_float","12","const,const,const",643950,"Using where; Using filesort"
3,"DEPENDENT SUBQUERY","tagsV5_objects","eq_ref","PRIMARY,Index_object","PRIMARY","12","ezinev5.photo_data.photo_data_id,const,const",1,"Using where"
2,"DEPENDENT SUBQUERY","pd","ref","Index_photos,Index_object_2,Index_int,Index_float,Index_modified,Index_object,Index_added_date","Index_photos","16","const,const,ezinev5.photo_data.photo_id,const",1,""

Sorry it's a long question but any ideas that anyone has would be great. Thanks Richard

A: 

It looks like the data_id column in photo_data determines what the *_val columns refer to. In your first query, this means that you are selecting data from the photo_data table twice. And if you need several values, you may need to query the table many more times or alternatively use grouping and case statements.

You could restructure your table to have a specific field for all of the common values that you need to store (in this case we would have a Views field). This allows you to simplify your queries, eg the first query could be rewritten to the following:

SELECT 
    Photo_ID, Views
FROM photo_data pd
    JOIN Tags t ON pd.photo_id = t.photo_ID AND tag_id = 2863
WHERE ...
ORDER BY Views DESC 
LIMIT 50

The restructured schema would probably be something like:

Photos table:     ID, User_ID, Added_Date, Status, Views, etc.
Images table:     Photo_ID, Image  (1 to 1 relationship to Photos)
Tags table:       ID, Tag_Name
Photo_tags table: Tag_ID, Photo_ID
ar
unless every foto has 100+ comments i don't know what you mean by big database ;) but that shouldnt keep you from setting up a proper structure :D
Joe Hopfgartner
The trouble is lets say tag_id is landscape and there are 30000 landscape images out of 1.5 million then this is going to take mysql along time to execute isn't it?Even if you did tags join photo_dataRichard
Richard Housham