views:

158

answers:

5

I've got two tables

A:

plant_ID | name.
1        | tree
2        | shrubbery
20       | notashrubbery

B:

area_ID | name    | plants
1       | forrest | *needhelphere*

now I want the area to store any number of plants, in a specific order and some plants might show up a number of times: e.g 2,20,1,2,2,20,1

Whats the most efficient way to store this array of plants?
Keeping in mind I need to make it so that if I perform a search to find areas with plant 2, i don't get areas which are e.g 1,20,232,12,20 (pad with leading 0s?) What would be the query for that?

if it helps, let's assume I have a database of no more than 99999999 different plants. And yes, this question doesn't have anything to do with plants....

Bonus Question Is it time to step away from MySQL? Is there a better DB to manage this?

+18  A: 

If you're going to be searching both by forest and by plant, sounds like you would benefit from a full-on many-to-many relationship. Ditch your plants column, and create a whole new area_plants table (or whatever you want to call it) to relate the two tables.

If plant 1 is in areas 1 and 2, and plant 2 is in areas 2 and 3, your areas_plants table would look like this:

area_id | plant_id | sort_idx
-----------------------------
      1 |        1 |     0
      1 |        2 |     1
      2 |        2 |     0
      2 |        3 |     1

You can then look up relationships from either side, and use simple JOINs to get the relevant data from either table. No need to muck about in LIKE conditions to figure out if it's in the list, blah, bleh, yuck. I've been there for a legacy database. No fun. Use SQL to its greatest potential.

Matchu
This (introducing a join table) is the correct solution to the problem. The only thing i would add is a sort_idx column to retain the plant order as requested in the question.
Paul Sasik
@Paul Sasik - well spotted :) Won't bother editing, since you've already got the answer right there.
Matchu
i went ahead and added the column. It helps to err on the side of pedantic for db design noobs i think.
Paul Sasik
First: It's looking good, i just am worried that I need to recreate this table every time the order of the plants changes, I add/remove/change a plant in the middle of the sequence. Second: This application is not going to be used by more then a half dozen of people, however the number of times a "plant" pops up in a "field" may be in the several millions, if not billions... So is this still the optimal solution?
Moak
Only thing I have to add is that the `area_id` and `plant_id` should be the primary key for the table - a unique constraint/index is the next best thing, unless you want to allow duplicate combinations...
OMG Ponies
@OMG Ponies: `area_id` and `plant_id` are unique in their respective tables - what do you mean by `for the table`? area 1 might have this sequence of plants: 2,20,1,2,2,20,1 (so area 1 has 3 shrubberies at index 0,3 and 4)
Moak
@Moak: Re #1: That's correct, without each row having a unique artificial key, you may find that the best way to save data is to first delete, then re-insert. This is fine, just make sure you're using a transaction (and a transaction-safe table engine, like InnoDB). Re #2, yes, it's still the optimal solution.
Charles
@Moak: Given your description, the link table ("the table" in Ponies' comment) would have a unique (or even primary) key of (area_id, sort_idx) and an additional index on plant_id.
Charles
@Moak: The map/link/xref table suggested in this answer requires a composite key to make sure you can not add dupliicate pairs.
OMG Ponies
@OMG Ponies: or someone, can someone please post the three table creates based on these answers? I think that is the answer I can accept.
Moak
@Moak Check my answer, I posted table creates.
Richard Knop
+3  A: 

You know this violates normal form?

Typically, one would have an areaplants table: area_ID, plant_ID with a unique constraint on the two and foreign keys to the other two tables. This "link" table is what gives you many-many or many-to-one relationships.

Queries on this are generally very efficient, they utilize indexes and do not require parsing strings.

Cade Roux
+1: But composite primary key > composite unique constraint
OMG Ponies
@OMG Ponies, yes, but sometimes people want a surrogate key on that table.
Cade Roux
Same people like ORM, and Miley Cyrus...
OMG Ponies
+2  A: 

Your relation attributes should be atomic, not made up of multiple values like lists. It is too hard to search them. You need a new relation that maps the plants to the area_ID and the area_ID/plant combination is the primary key.

BobTurbo
+5  A: 

How about this:

table: plants

plant_ID | name
1        | tree
2        | shrubbery
20       | notashrubbery

table: areas

area_ID | name
1       | forest

table: area_plant_map

area_ID | plant_ID | sequence
1       | 1        | 0
1       | 2        | 1
1       | 20       | 2

That's the standard normalized way to do it (with a mapping table).

To find all areas with a shrubbery (plant 2), do this:

SELECT *
FROM areas
INNER JOIN area_plant_map ON areas.area_ID = area_plant_map.area_ID
WHERE plant_ID = 2
mgroves
I read "shubbery" and now have the Knights of Nee! skit running through my head...
OMG Ponies
See question - Area 1 might have this sequence of plants: 2,20,1,2,2,20,1 (so area 1 has 3 shrubberies at position 0,3 and 4)
Moak
@Moak in that case, just put another column with a sequence number.
mgroves
+2  A: 

Use many-to-many relationship:

CREATE TABLE plant (
    plant_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255)
) ENGINE=INNODB;

CREATE TABLE area (
    area_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255)
) ENGINE=INNODB;

CREATE TABLE plant_area_xref (
    plant_id INT NOT NULL,
    area_id INT NOT NULL,
    sort_idx INT NOT NULL,
    FOREIGN KEY (plant_id) REFERENCES plant(plant_id) ON DELETE CASCADE,
    FOREIGN KEY (area_id) REFERENCES area(area_id) ON DELETE CASCADE,
    PRIMARY KEY  (plant_id, area_id, sort_idx)    
) ENGINE=INNODB;

EDIT:

Just to answer your bonus question:

Bonus Question Is it time to step away from MySQL? Is there a better DB to manage this?

This has nothing to do with MySQL. This was just an issue with bad database design. You should use intersection tables and many-to-many relationship for cases like this in every RDBMS (MySQL, Oracle, MSSQL, PostgreSQL etc).

Richard Knop
doesn't this miss the sort_idx? `plant_id, area_id` is not unique, `area_id / sort_idx` would be. I need to be able to keep this sequence intact `area: 1` -> `plants: 2,20,1,2,2,20,1` (in that order)
Moak
Added sort_idx. plant_id, area_id do not need to be unique. Uniqueness is achieved by double primary key.
Richard Knop
yes, but my point is that Area 1 - Plant 2 might occure double (the same plant twice in the area, so the real uniqueness is area / sort_index. i got what i needed (double primary) if you change it then it then I can accept the answer.
Moak