views:

91

answers:

5

In a certain TABLE, I have a VARTEXT field which includes comma-separated values of country codes. The field is named cc_list. Typical entries look like the following:

'DE,US,IE,GB'

'IT,CA,US,FR,BE'

Now given a country code, I want to be able to efficiently find which records include that country. Obviously there's no point in indexing this field. I can do the following

SELECT * from TABLE where cc_list LIKE '%US%';

But this is inefficient.

Since the "IN" function is supposed to be efficient (it bin-sorts the values), I was thinking along the lines of

SELECT * from TABLE where 'US' IN cc_list

But this doesn't work - I think the 2nd operand of IN needs to be a list of values, not a string. Is there a way to convert a CSV string to a list of values? Any other suggestions? Thanks!

+4  A: 
SELECT  *
FROM    MYTABLE
WHERE   FIND_IN_SET('US', cc_list)

In a certain TABLE, I have a VARTEXT field which includes comma-separated values of country codes.

If you want your queries to be efficient, you should create a many-to-many link table:

CREATE TABLE table_country (cc CHAR(2) NOT NULL, tableid INT NOT NULL, PRIMARY KEY (cc, tableid))

SELECT  *
FROM    tablecountry tc
JOIN    mytable t
ON      t.id = tc.tableid
WHERE   t.cc = 'US'

Alternatively, you can set ft_min_word_len to 2, create a FULLTEXT index on your column and query like this:

CREATE FULLTEXT INDEX fx_mytable_cclist ON mytable (cc_list);

SELECT  *
FROM    MYTABLE
WHERE   MATCH(cc_list) AGAINST('+US' IN BOOLEAN MODE)

This only works for MyISAM tables and the argument should be a literal string (you won't be able to join on this condition).

Quassnoi
Thank you, this is the function I was missing!
bosh
Aren't many-to-many conditions so much fun. I've never found a better solution than an extra table. Of course, then you have to maintain that extra table (some key constraints would help, but of course you don't get that unless you're on 5.x and (I think) using InnoDB as your engine)
Tom
+1  A: 

find_in_set seems to be the MySql function you want. If you could actually store those comma-separated strings as MySql sets (no more than 64 possible countries, or splitting countries into two groups of no more than 64 each), you could keep using find_in_set and go a bit faster.

Alex Martelli
+1  A: 

There's no efficient way to find what you want. A table scan will be necessary. Putting multiple values into a single text field is a terrible misuse of relational database technology. If you refactor (if you have access to the database structure) so that the country codes are properly stored in a separate table you will be able to easily and quickly retrieve the data you want.

Larry Lustig
A: 

One approach that I've used successfully before (not on mysql, though) is to place a trigger on the table that splits the values (based on a specific delimiter) into discrete values, inserting them into a sub-table. Your select can then look like this:

SELECT * from TABLE where cc_list IN 
(
   select cc_list_name from cc_list_subtable 
   where c_list_subtable.table_id = TABLE.id
)

where the trigger parses cc_list in TABLE into separate entries in column cc_list_name in table cc_list_subtable. It involves a bit of work in the trigger, too, as every change to TABLE means that associated rows in cc_list_table have to be deleted/updated/inserted as appropriate, but is an approach that works in situations where the original table TABLE has to retain its original structure, but where you are free to adapt the query as you see fit.

davek
+2  A: 

The first rule of normalization says you should change multi-value columns such as cc_list into a single value field for this very reason.

Preferably into it's own table with IDs for each country code and a pivot table to support a many-to-many relationship.

CREATE TABLE my_table (
  my_id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
  mystuff VARCHAR NOT NULL,
  PRIMARY KEY(my_id)
);

# this is the pivot table
CREATE TABLE my_table_countries (
  my_id INT(11) UNSIGNED NOT NULL,
  country_id SMALLINT(5) UNSIGNED NOT NULL,
  PRIMARY KEY(my_id, country_id)
);

CREATE TABLE countries {
  country_id SMALLINT(5) UNSIGNED NOT NULL AUTO_INCREMENT,
  country_code CHAR(2) NOT NULL,
  country_name VARCHAR(100) NOT NULL,
  PRIMARY KEY (country_id)
);

Then you can query it making use of indexes:

SELECT * FROM my_table JOIN my_table_countries USING (my_id) JOIN countries USING (country_id) WHERE country_code = 'DE'

SELECT * FROM my_table JOIN my_table_countries USING (my_id) JOIN countries USING (country_id) WHERE country_code IN('DE','US')

You may have to group the results my my_id.

Greg K
Thanks, I understand that this is much more efficient!
bosh