views:

384

answers:

6

I have an sql query that counts the number of results for a complex query. The actual select query is very fast when limiting to 20 results, but the count version takes about 4.5 seconds on my current tables after lots of optimizing.

If I remove the two joins and where clauses on site tags and gallery tags, the query performs at 1.5 seconds. If I create 3 separate queries - one to select the pay sites, one to select the names and one to pull everything together - I can get the query down to .6 seconds, which is still not good enough. This would also force me to use a stored procedure since I will have to make a total of 4 queries in Hibernate.

For the query "as is", here is some info:

The Handler_read_key is 1746669 The Handler_read_next is 1546324

The gallery table has 40,000 rows

The site table has 900 rows

The name table has 800 rows

The tag table has 3560 rows

I'm pretty new to mysql and tuning, and I have indexes on the: - 'term' column in the tag table - 'published' column in the gallery table - 'value' for the name table

Any help to get this query to 0.1 millseconds would be appreciated, because I am at a total loss.

SELECT count(distinct gallery.id)
from gallery gallery 
    inner join
        site site 
            on gallery.site_id = site.id 
    inner join
        site_to_tag p2t 
            on site.id = p2t.site_id 
    inner join
        tag site_tag 
            on p2t.tag_id = site_tag.id 
    inner join
        gallery_to_name g2mn 
            on gallery.id = g2mn.gallery_id 
    inner join
        name name 
            on g2mn.name_id = name.id 
    inner join
        gallery_to_tag g2t 
            on gallery.id = g2t.gallery_id 
    inner join
        tag tag 
            on g2t.tag_id = tag.id
where
    gallery.published = true and (
        name.value LIKE 'sometext%' or
        tag.term = 'sometext' or 
        site.`name` like 'sometext%' or
        site_tag.term = 'sometext'
    )

Explain Data:

| id | select_type | table        | type   | possible_keys                                                     | key                | key_len | ref                                       | rows | Extra                              |
+----+-------------+--------------+--------+-------------------------------------------------------------------+--------------------+---------+-------------------------------------------+------+------------------------------------+
|  1 | SIMPLE      | site         | index  | PRIMARY,nameIndex                                                 | nameIndex          | 258     | NULL                                      |  950 | Using index; Using temporary       |
|  1 | SIMPLE      | gallery      | ref    | PRIMARY,publishedIndex,FKF44C775296EECE37,publishedSiteIdIndex    | FKF44C775296EECE37 | 9       | production.site.id                        |   20 | Using where                        |
|  1 | SIMPLE      | g2mn         | ref    | PRIMARY,FK3EFFD7F8AFAD7A5E,FK3EFFD7F832C04188                     | FK3EFFD7F8AFAD7A5E | 8       | production.gallery.id                     |    1 | Using index; Distinct              |
|  1 | SIMPLE      | name         | eq_ref | PRIMARY,valueIndex                                                | PRIMARY            | 8       | production.g2mn.name_id                   |    1 | Distinct                           |
|  1 | SIMPLE      | g2t          | ref    | PRIMARY,FK3DDB4D63AFAD7A5E,FK3DDB4D63E210FBA6                     | FK3DDB4D63AFAD7A5E | 8       | production.g2mn.gallery_id                |    2 | Using where; Using index; Distinct |
|  1 | SIMPLE      | tag          | eq_ref | PRIMARY,termIndex                                                 | PRIMARY            | 8       | production.g2t.tag_id                     |    1 | Distinct                           |
|  1 | SIMPLE      | p2t          | ref    | PRIMARY,FK29424AB796EECE37,FK29424AB7E210FBA6                     | PRIMARY            | 8       | production.gallery.site_id                |    3 | Using where; Using index; Distinct |
|  1 | SIMPLE      | site_tag     | eq_ref | PRIMARY,termIndex                                                 | PRIMARY            | 8       | production.p2t.tag_id                     |    1 | Using where; Distinct              |
+----+-------------+--------------+--------+-------------------------------------------------------------------+--------------------+---------+-------------------------------------------+------+------------------------------------+

Individual Count Speeds:

[SQL] select count(*) from gallery;
Affected rows: 0
Time: 0.014ms
Results: 40385

[SQL] 
select count(*) from gallery_to_name;
Affected rows: 0
Time: 0.012ms
Results: 35615

[SQL] 
select count(*) from gallery_to_tag;
Affected rows: 0
Time: 0.055ms
Results: 165104

[SQL] 
select count(*) from tag;
Affected rows: 0
Time: 0.002ms
Results: 3560    

[SQL] 
select count(*) from site;
Affected rows: 0
Time: 0.001ms
Results: 901

[SQL] 
select count(*) from site_to_tag;
Affected rows: 0
Time: 0.003ms
Results: 7026
+1  A: 

Count's are often slow as they require fetching all data returned by the cursor in order to figure out how many rows would actually be fetched.

How long does it take to do a count on each of the individual tables? Add up the total times - if it's more than 0.1 milliseconds I don't think you'll be able to get the query to execute as fast as you'd like. As far as ways to speed it up goes, you could try pushing some of the WHERE clause criteria into a sub-select, as in

select 
    count(distinct this_.id) as y0_  
from 
    (select * from gallery where published=?) this_  
inner join 
    site site3_  
        on this_.site_id=site3_.id  
inner join 
    site_to_tag list7_  
        on site3_.id=list7_.site_id  
inner join 
    tag sitetag4_  
        on list7_.tag_id=sitetag4_.id  
inner join 
    gallery_to_name names9_  
        on this_.id=names9_.gallery_id  
inner join 
    name name2_  
        on names9_.name_id=name2_.id  
inner join 
    gallery_to_tag list11_  
        on this_.id=list11_.gallery_id  
inner join 
    tag tag1_  
    on list11_.tag_id=tag1_.id  
where lower(name2_.value) like ? or 
      tag1_.term=? or
      lower(site3_.name) like ? or
      lower(this_.description) like ? or
      sitetag4_.term=?

How many fields are on each of these tables? Can you use sub-selects to cut down on the amount of data the database has to join together, or do you really need all the columns?

The presence of three LIKE predicates is going to slow things down, as will the use of the LOWER function in the WHERE clause. If you need to be able to do case-insensitive compares it might be better to have two fields, one in 'normal' (as typed in) case and one stored in lower (or UPPER) case to do insensitive searches on. You could use a trigger to keep the lower/UPPER one in sync with the 'normal' case version.

I hope this helps.

EDIT:

Looking at the EXPLAIN PLAN output it doesn't appear that the fields used in your WHERE clause are indexed - or at least it appears the indexes aren't being used. This could be a by-product of all the OR predicates in the WHERE. If these fields aren't indexed, you might try indexing them.

Bob Jarvis
Thanks for the time and effort you've put into the response. I tried the sub-select with the publish, and it actually makes the query take a bit longer :(. I posted the times of the counts above.
egervari
I noticed that Hibernate was not putting primary key's around the many-to-many joins. So in the cases of gallery_to_name, gallery_to_tag and site_to_tag, there was no primary key :(. I added the primary keys, and the query went from 9.5 seconds to 5 seconds. Do I need to index these further?
egervari
@egervari - if you don't already have indexes on them you might try indexing name.value, tag.term, site.name, and gallery.description. I don't know how useful they'll be but it's something to try.
Bob Jarvis
+1  A: 

It appears that your WHERE clause may be the offender, especially the following:

lower(name2_.value) like ?

According to MySQL documentation:

The default character set and collation are latin1 and latin1_swedish_ci, so nonbinary string comparisons are case insensitive by default.

You may not need the LOWER() function in your WHERE clause. Functions on the left side of the comparison prevent the use of indexes.

What do your LIKE values look like? If you are using a wildcard on the left side of the value, it prevents the use of indexes.

Try replacing your OR statements with UNION.

Try running the query without DISTINCT just to see how much it's affecting your query.

Marcus Adams
Changing all of the lower()/like where clauses to plain = (just for the heck of it) results in an extremely minor performance gain in comparison to the total query time. It's around .4-.5 second boost. That's it. This was one of the first things I had tried before I posted. It's clear that this is not the problem.
egervari
@Egervari, The reason it's probably not helping much is because it's not using the index on name2_ anyway. If your LIKE clause is good, most likely, it's the combined OR clauses that are affecting it. Switch to UNION. You must make all of the changes to improve the query and use the index.
Marcus Adams
@egervari, to put things in perspective, that .4-.5s savings is 40-50 times your goal for total execution time.
mikerobi
Voting down suggestions is going to encourage people to not make suggestions.
Donnie
+1  A: 

OR murders query performance, even with good indexes. It gets worse as tables get larger.

This is horrifcly ugly, but it's likely to be faster (at the expense of readability, obviously). If MySQL only supported CTEs then this would be much, much neater.

You could also look into writing a short batch and selecting the common part of the repeated query into a temp table and then doing everything against the temp table. You may or may not have to index the temp table for this to work out well, it depends on row counts really.

(Note that union already does a distinct, so there's no need to do it again the count and force another sort)

select
  count(id)   
from (
SELECT gallery.id
from gallery gallery 
    inner join
        site site 
          on gallery.site_id = site.id 
    inner join
        site_to_tag p2t 
            on site.id = p2t.site_id 
    inner join
        tag site_tag 
            on p2t.tag_id = site_tag.id 
    inner join
        gallery_to_name g2mn 
            on gallery.id = g2mn.gallery_id 
    inner join
        name name 
            on g2mn.name_id = name.id 
    inner join
        gallery_to_tag g2t 
            on gallery.id = g2t.gallery_id 
    inner join
        tag tag 
            on g2t.tag_id = tag.id
where
    gallery.published = true and name.value like 'sometext%'
UNION
SELECT gallery.id
from gallery gallery 
    inner join
        site site 
          on gallery.site_id = site.id 
    inner join
        site_to_tag p2t 
            on site.id = p2t.site_id 
    inner join
        tag site_tag 
            on p2t.tag_id = site_tag.id 
    inner join
        gallery_to_name g2mn 
            on gallery.id = g2mn.gallery_id 
    inner join
        name name 
            on g2mn.name_id = name.id 
    inner join
        gallery_to_tag g2t 
            on gallery.id = g2t.gallery_id 
    inner join
        tag tag 
            on g2t.tag_id = tag.id
where
    gallery.published = true and tag.term = 'sometext'
UNION
SELECT gallery.id
from gallery gallery 
    inner join
        site site 
          on gallery.site_id = site.id 
    inner join
        site_to_tag p2t 
            on site.id = p2t.site_id 
    inner join
        tag site_tag 
            on p2t.tag_id = site_tag.id 
    inner join
        gallery_to_name g2mn 
            on gallery.id = g2mn.gallery_id 
    inner join
        name name 
            on g2mn.name_id = name.id 
    inner join
        gallery_to_tag g2t 
            on gallery.id = g2t.gallery_id 
    inner join
        tag tag 
            on g2t.tag_id = tag.id
where
    gallery.published = true and site.`name` like 'sometext%'
UNION
SELECT gallery.id
from gallery gallery 
    inner join
        site site 
          on gallery.site_id = site.id 
    inner join
        site_to_tag p2t 
            on site.id = p2t.site_id 
    inner join
        tag site_tag 
            on p2t.tag_id = site_tag.id 
    inner join
        gallery_to_name g2mn 
            on gallery.id = g2mn.gallery_id 
    inner join
        name name 
            on g2mn.name_id = name.id 
    inner join
        gallery_to_tag g2t 
            on gallery.id = g2t.gallery_id 
    inner join
        tag tag 
            on g2t.tag_id = tag.id
where
    gallery.published = true and site_tag.term = 'sometext'
) as x
Donnie
A: 

I admit I didn't take the time to fully understand your tables and queries. However, for the kind of response time you're asking for, and for the apparent complexity of the current suggestions, I would say this is one of those situations where (instead of asking SQL to tally all the records I want to count) I'd keep a separate table of always-up-to-date counts, and always update any appropriate counts with triggered code upon any record add/change/delete.

Eg, imagine a transaction file with a million rows, and I want the total of field 2. I can ask the db to SUM() the field, or I can keep a separate total for field 2 in a table somewhere that gets adjusted any time a record is added, deleted, or has field 2 edited. It's redundant, but super fast when I want to know the total. And I can always SUM() if I want to audit my separate computed total.

joe snyder
+7  A: 

I've included my test schema and a script to produce test data at the end of this post. I have used the SQL_NO_CACHE option to prevent MySQL from caching query results - this is just for testing and should ultimately be removed.

This is a similar idea to that proposed by Donnie, but I have tidied it up a little. If I have understood the joins correctly, there is no need to repeat all the joins in each select, as each is effectively independent from the others. The original WHERE clause stipulates that gallery.published must be true and then follows with a series of 4 conditions joined by OR. Each query can therefore be executed separately. Here are the four joins:

gallery <--> gallery_to_name <--> name
gallery <--> gallery_to_tag <--> tag
gallery <--> site
gallery <--> site <--> site_to_tag <--> tag

Because gallery contains site_id, in this case, there's no need for the intermediate join via the site table. The last join can therefore be reduced to this:

gallery <--> site_to_tag <--> tag

Running each SELECT separately, and using UNION to combine the results, is very fast. The results here assume the table structures and indexes shown at the end of this post:

SELECT SQL_NO_CACHE COUNT(id) AS matches FROM (
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN site AS s ON s.id = g.site_id
    WHERE g.published = TRUE AND s.name LIKE '3GRD%')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
    INNER JOIN name AS n ON n.id = g2n.name_id
    WHERE g.published = TRUE AND n.value LIKE '3GRD%')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN gallery_to_tag  AS g2t ON g2t.gallery_id = g.id
    INNER JOIN tag AS gt  ON gt.id = g2t.tag_id
    WHERE g.published = TRUE AND gt.term = '3GRD')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN site_to_tag AS s2t ON s2t.site_id = g.site_id
    INNER JOIN tag AS st  ON st.id = s2t.tag_id
    WHERE g.published = TRUE AND st.term = '3GRD')
) AS totals;

+---------+
| matches |
+---------+
|      99 |
+---------+
1 row in set (0.00 sec)

The speed does vary depending on the search criteria. In the following example, a different search value is used for each table, and the LIKE operator has to do a little more work, as there are now more potential matches for each:

SELECT SQL_NO_CACHE COUNT(id) AS matches FROM (
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN site AS s ON s.id = g.site_id
    WHERE g.published = TRUE AND s.name LIKE '3H%')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
    INNER JOIN name AS n ON n.id = g2n.name_id
    WHERE g.published = TRUE AND n.value LIKE '3G%')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN gallery_to_tag  AS g2t ON g2t.gallery_id = g.id
    INNER JOIN tag AS gt  ON gt.id = g2t.tag_id
    WHERE g.published = TRUE AND gt.term = '3IDP')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN site_to_tag AS s2t ON s2t.site_id = g.site_id
    INNER JOIN tag AS st  ON st.id = s2t.tag_id
    WHERE g.published = TRUE AND st.term = '3OJX')
) AS totals;

+---------+
| matches |
+---------+
|   12505 |
+---------+
1 row in set (0.24 sec)

These results compare favourably with the a query which uses multiple joins:

SELECT SQL_NO_CACHE COUNT(DISTINCT g.id) AS matches
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name            AS n   ON n.id = g2n.name_id
INNER JOIN gallery_to_tag  AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag             AS gt  ON gt.id = g2t.tag_id
INNER JOIN site            AS s   ON s.id = g.site_id
INNER JOIN site_to_tag     AS s2t ON s2t.site_id = s.id
INNER JOIN tag             AS st  ON st.id = s2t.tag_id
WHERE g.published = TRUE AND (
    gt.term = '3GRD' OR
    st.term = '3GRD' OR
    n.value LIKE '3GRD%' OR
    s.name LIKE '3GRD%');

+---------+
| matches |
+---------+
|      99 |
+---------+
1 row in set (2.62 sec)

SELECT SQL_NO_CACHE COUNT(DISTINCT g.id) AS matches
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name            AS n   ON n.id = g2n.name_id
INNER JOIN gallery_to_tag  AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag             AS gt  ON gt.id = g2t.tag_id
INNER JOIN site            AS s   ON s.id = g.site_id
INNER JOIN site_to_tag     AS s2t ON s2t.site_id = s.id
INNER JOIN tag             AS st  ON st.id = s2t.tag_id
WHERE g.published = TRUE AND (
    gt.term = '3IDP' OR
    st.term = '3OJX' OR
    n.value LIKE '3G%' OR
    s.name LIKE '3H%');

+---------+
| matches |
+---------+
|   12505 |
+---------+
1 row in set (3.17 sec)

SCHEMA
The indexes on id columns plus site.name, name.value and tag.term are important:

DROP SCHEMA IF EXISTS `egervari`;
CREATE SCHEMA IF NOT EXISTS `egervari`;
USE `egervari`;

-- -----------------------------------------------------
-- Table `site`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `site` ;
CREATE  TABLE IF NOT EXISTS `site` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  `name` VARCHAR(255) NOT NULL ,
  INDEX `name` (`name` ASC) ,
  PRIMARY KEY (`id`) )
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `gallery`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `gallery` ;
CREATE  TABLE IF NOT EXISTS `gallery` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  `site_id` INT UNSIGNED NOT NULL ,
  `published` TINYINT(1) NOT NULL DEFAULT 0 ,
  PRIMARY KEY (`id`) ,
  INDEX `fk_gallery_site` (`site_id` ASC) ,
  CONSTRAINT `fk_gallery_site`
    FOREIGN KEY (`site_id` )
    REFERENCES `site` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `name`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `name` ;
CREATE  TABLE IF NOT EXISTS `name` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  `value` VARCHAR(255) NOT NULL ,
  INDEX `value` (`value` ASC) ,
  PRIMARY KEY (`id`) )
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `tag`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `tag` ;
CREATE  TABLE IF NOT EXISTS `tag` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  `term` VARCHAR(255) NOT NULL ,
  INDEX `term` (`term` ASC) ,
  PRIMARY KEY (`id`) )
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `gallery_to_name`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `gallery_to_name` ;
CREATE  TABLE IF NOT EXISTS `gallery_to_name` (
  `gallery_id` INT UNSIGNED NOT NULL ,
  `name_id` INT UNSIGNED NOT NULL ,
  PRIMARY KEY (`gallery_id`, `name_id`) ,
  INDEX `fk_gallery_to_name_gallery` (`gallery_id` ASC) ,
  INDEX `fk_gallery_to_name_name` (`name_id` ASC) ,
  CONSTRAINT `fk_gallery_to_name_gallery`
    FOREIGN KEY (`gallery_id` )
    REFERENCES `gallery` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE,
  CONSTRAINT `fk_gallery_to_name_name`
    FOREIGN KEY (`name_id` )
    REFERENCES `name` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `gallery_to_tag`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `gallery_to_tag` ;
CREATE  TABLE IF NOT EXISTS `gallery_to_tag` (
  `gallery_id` INT UNSIGNED NOT NULL ,
  `tag_id` INT UNSIGNED NOT NULL ,
  PRIMARY KEY (`gallery_id`, `tag_id`) ,
  INDEX `fk_gallery_to_tag_gallery` (`gallery_id` ASC) ,
  INDEX `fk_gallery_to_tag_tag` (`tag_id` ASC) ,
  CONSTRAINT `fk_gallery_to_tag_gallery`
    FOREIGN KEY (`gallery_id` )
    REFERENCES `gallery` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE,
  CONSTRAINT `fk_gallery_to_tag_tag`
    FOREIGN KEY (`tag_id` )
    REFERENCES `tag` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `site_to_tag`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `site_to_tag` ;
CREATE  TABLE IF NOT EXISTS `site_to_tag` (
  `site_id` INT UNSIGNED NOT NULL ,
  `tag_id` INT UNSIGNED NOT NULL ,
  PRIMARY KEY (`site_id`, `tag_id`) ,
  INDEX `fk_site_to_tag_site` (`site_id` ASC) ,
  INDEX `fk_site_to_tag_tag` (`tag_id` ASC) ,
  CONSTRAINT `fk_site_to_tag_site`
    FOREIGN KEY (`site_id` )
    REFERENCES `site` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE,
  CONSTRAINT `fk_site_to_tag_tag`
    FOREIGN KEY (`tag_id` )
    REFERENCES `tag` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

TEST DATA
This populates site with 900 rows, tag with 3560 rows, name with 800 rows and gallery with 40,000 rows, and inserts entries into the link tables:

DELIMITER //
DROP PROCEDURE IF EXISTS populate//
CREATE PROCEDURE populate()
BEGIN
    DECLARE i INT DEFAULT 0;

    WHILE i < 900 DO
        INSERT INTO site (name) VALUES (CONV(i + 1 * 10000, 20, 36));
        SET i = i + 1;
    END WHILE;

    SET i = 0;
    WHILE i < 3560 DO
        INSERT INTO tag (term) VALUES (CONV(i + 1 * 10000, 20, 36));
        INSERT INTO site_to_tag (site_id, tag_id) VALUES ( (i MOD 900) + 1, i + 1 );
        SET i = i + 1;
    END WHILE;

    SET i = 0;
    WHILE i < 800 DO
        INSERT INTO name (value) VALUES (CONV(i + 1 * 10000, 20, 36));
        SET i = i + 1;
    END WHILE;

    SET i = 0;
    WHILE i < 40000 DO    
        INSERT INTO gallery (site_id, published) VALUES ( (i MOD 900) + 1, i MOD 2 );
        INSERT INTO gallery_to_name (gallery_id, name_id) VALUES ( i + 1, (i MOD 800) + 1 );
        INSERT INTO gallery_to_tag (gallery_id, tag_id) VALUES ( i + 1, (i MOD 3560) + 1 );
        SET i = i + 1;
    END WHILE;
END;
//
DELIMITER ;
CALL populate();
Mike
A: 

Hm... just looking at your post for two minutes, so my answer might not be perfect... but have you thought of introducing an index table that links to the other entities?

like

CREATE  TABLE `references`
  `text` VARC>HAR(...) NOT NULL,
  `name` VARCHAR(255) NOT NULL,
  `reference_type` WHATEVER, // enum or what suits your needs
  `reference_id` INTEGER NOT NULL
);

Then just query this table:

SELECT COUNT(*) FROM references WHERE sometext LIKE ...;

Would have to handle the cases with 'sometext%' though...

Also, is the number of galleries really important, or is your query just intended to check whether a single one exists?

Axel