views:

775

answers:

7

My experience with MySQL is very basic. The simple stuff is easy enough, but I ran into something that is going to require a little more knowledge. I have a need for a table that stores a small list of words. The number of words stored could be anywhere between 1 to 15. Later, I plan on searching through the table by these words. I have thought about a few different methods:

A.) I could create the database with 15 fields, and just fill the fields with null values whenever the data is smaller than 15. I don't really like this. It seems really inefficient.

B.) Another option is to use just a single field, and store the data as a comma separated list. Whenever I come back to search, I would just run a regular expression on the field. Again, this seems really inefficient.

I would hope there is a good alternative to those two options. Any advice would be very appreciated.

-Thanks

+12  A: 

C) use a normal form; use multiple rows with appropriate keys. an example:

mysql> SELECT * FROM blah;
+----+-----+-----------+
| K  | grp | name      |
+----+-----+-----------+
|  1 |   1 | foo       |
|  2 |   1 | bar       |
|  3 |   2 | hydrogen  |
|  4 |   4 | dasher    |
|  5 |   2 | helium    |
|  6 |   2 | lithium   |
|  7 |   4 | dancer    |
|  8 |   3 | winken    |
|  9 |   4 | prancer   |
| 10 |   2 | beryllium |
| 11 |   1 | baz       |
| 12 |   3 | blinken   |
| 13 |   4 | vixen     |
| 14 |   1 | quux      |
| 15 |   4 | comet     |
| 16 |   2 | boron     |
| 17 |   4 | cupid     |
| 18 |   4 | donner    |
| 19 |   4 | blitzen   |
| 20 |   3 | nod       |
| 21 |   4 | rudolph   |
+----+-----+-----------+
21 rows in set (0.00 sec)

This is the table I posted in this other question about group_concat. You'll note that there is a unique key K for every row. There is another key grp which represents each category. The remaining field represents a category member, and there can be variable numbers of these per category.

Jason S
Awesome man! That's really interesting actually.
Hurpe
A: 

You are correct that A is no good. B is also no good, as it fails to adhere to First Normal Form (each field must be atomic). There's nothing in your example that suggests you would gain by avoiding 1NF.

You want a table for your list of words with each word in its own row.

Abie
So a better method would be create a new table for each list of words, then organize all the tables in one main table with a field containing table names? I could do that, and that doesn't seem so bad.
Hurpe
A: 

I would create a table with and ID and one field, then store your results as multiple records. This offers many benefits. For example, you can then programatically enforce your 15 word limit instead of doing it in your design, so if you ever change your mind it should be rather easy. Your queries to search on the data will also be much faster to run, regular expressions take a lot of time to run (comparatively). Plus using a varchar for the field will allow you to compress your table much better. And indexing on the table should be much easier (more efficient) with this design.

Ryan Guill
A: 

Do the extra work and store the 15 words as 15 rows in the table, i.e. normalize the data. It may require you to re-think your strategy a bit, but trust me when the client comes along and says "Can you change that 15 limit to 20...", you'll be glad you did.

EJB
A: 

What other data is associated with these words?

One typical way to handle this kind of problem is best described by example. Let's assume your table captures certain words found in certain documents. One typical way is to assign each document an identifier. Let's pretend, for the moment, that each document is a web URL, so you'd have a table something like this:

CREATE TABLE WebPage (
    ID INTEGER NOT NULL,
    URL VARCHAR(...) NOT NULL
)

Your Words table might look something like this:

CREATE TABLE Words (
    Word VARCHAR(...) NOT NULL,
    DocumentID INTEGER NOT NULL 
)

Then, for each word, you create a new row in the table. To find all words in a particular document, select by the document's ID:

SELECT Words.Word FROM Words, WebPage 
WHERE Words.DocumentID = WebPage.DocumentID
AND WebPage.URL = 'http://whatever/web/page/'

To find all documents with a particular word, select by word:

SELECT WebPage.URL FROM WebPage, Words
WHERE Words.Word = 'hello' AND Words.DocumentID = WebPage.DocumentID

Or some such.

Brian Clapper
A: 

Depending on exactly what you want to accomplish:

  1. Use a full-text index on your string table

  2. Three tables: one for the original string, one for unique words (after word-rooting?), and a join table. This would also let you do more complicated searches, like "return all strings containing at least three of the following five words" or "return all strings where 'fox' occurs after 'dog'".

    CREATE TABLE string ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, string TEXT NOT NULL )

    CREATE TABLE word ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, word VARCHAR(14) NOT NULL UNIQUE, UNIQUE INDEX (word ASC) )

    CREATE TABLE word_string ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, string_id INT NOT NULL, word_id INT NOT NULL, word_order INT NOT NULL, FOREIGN KEY (string_id) REFERENCES (string.id), FOREIGN KEY (word_id) REFERENCES (word.id), INDEX (word_id ASC) )

    // Sample data INSERT INTO string (string) VALUES ('This is a test string'), ('The quick red fox jumped over the lazy brown dog')

    INSERT INTO word (word) VALUES ('this'), ('test'), ('string'), ('quick'), ('red'), ('fox'), ('jump'), ('over'), ('lazy'), ('brown'), ('dog')

    INSERT INTO word_string ( string_id, word_id, word_order ) VALUES ( 0, 0, 0 ), ( 0, 1, 3 ), ( 0, 2, 4 ), ( 1, 3, 1 ), ( 1, 4, 2 ), ( 1, 5, 3 ), ( 1, 6, 4 ), ( 1, 7, 5 ), ( 1, 8, 7 ), ( 1, 9, 8 ), ( 1, 10, 9 )

    // Sample query - find all strings containing 'fox' and 'quick' SELECT UNIQUE string.id, string.string FROM string INNER JOIN word_string ON string.id=word_string.string_id INNER JOIN word AS fox ON fox.word='fox' AND word_string.word_id=fox.id INNER JOIN word AS quick ON quick.word='quick' AND word_string.word_id=word.id

Hugh Bothwell
A: 

Hurpe, is the scenario you are describing that you will have a database table with a column that can contain a up to 15 keywords. Later you will use these keywords to search the table which will presumably have other columns as well?

Then isn't the answer to have a separate table for the keywords? You will also need to have a many-to-many relationship between the keywords and the main table.

So using cars as an example, the WORD table that will store the 15 or so keywords would have the following structure:

ID             int
Word           varchar(100)

The CAR table would have a structure something like:

ID              int
Name            varchar(100)

Then finally you need a CAR_WORD table to hold the many-to-many relationships:

ID              int
CAR_ID          int
WORD_ID         int

And sample data to go with this for the WORD table:

ID   Word

001  Family
002  Sportscar
003  Sedan
004  Hatchback
005  Station-wagon
006  Two-door
007  Four-door
008  Diesel
009  Petrol

together with sample data for the CAR table

ID   Name

001  Audi TT
002  Audi A3
003  Audi A4

then the intersection CAR_WORD table sample data could be:

ID    CAR_ID   WORD_ID
001   001      002
002   001      006
003   001      009

which give the Audi TT the correct characteristics.

and finally the SQL to search would be something like:

SELECT c.name
FROM CAR c
INNER JOIN CAR_WORD x
ON c.id = x.id
INNER JOIN WORD w
ON x.id = w.id
WHERE w.word IN('Petrol', 'Two-door')

Phew! Didn't intend to set out to write quite so much, it looks complicated but it is where I always seem to end up however hard I try to simplify things.

James Piggot