views:

302

answers:

4

I have a MySQL database, and a particular table in that database will need to be self-referencing, in a one-to-many fashion. For scalability, I need to find the most efficient solution possible. The two ways most evident to me are:

1) Add a text field to the table, and store a serialized list of primary keys there

2) Keep a linker table, with each row being a one-to-one.

In case #1, I see the table growing very very wide (using a spatial analogy), but in case #2, I see the linker table growing to a very large number of rows, which would slow down lookups (by far the most common operation).

What's the most efficient manner in which to implement such a one-to-many relationship in MySQL? Or, perhaps, there is a much saner solution keeping the data all directly on the filesystem somehow, or else some other storage engine?

+1  A: 

Just keep a table for the "many", with a key column for the primary table.

I quarantee you'll have lots of other more important problems to solve before you run into efficiency or capacity constraints in a standard industrial-strength relational dbms.

IMHO the most likely second option (with numerous alternative products) is to use an isam.

le dorfier
Comment per your edits: Updates should be no problem since you'll be updating values rather than keys (so indexes won't be involved.) Inserts shouldn't be much either if the records are small and the keys well-dispersed, which usually takes care of itself.
le dorfier
Thought of a good way to explain the problem. Think of it like a "friends list" on a social network. Each account can have many, many accounts listed as "friends" (in the form of accountIDs). Migrating the storage to a third table just migrates the same problems to that other table - in one case, it becomes very wide, and in the other case it becomes very long.Unless of course MySQL is actually more efficient at dealing with large volumes using one of those two approaches.
Chris
A: 

My first comment would be that you'll get better responses if you can describe how the data will be used (frequency of adds/updates vs lookups, adds vs updates, etc) in addition to what you've already described. That being said, my first thought would be to just go with a generic representation of


CREATE  TABLE IF NOT EXISTS one_table (
  `one_id` INT UNSIGNED  NOT NULL AUTO_INCREMENT
           COMMENT 'The The ID of the items in the one table' ,
  ... other data
)

CREATE  TABLE IF NOT EXISTS many_table (
  `many_id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT
            COMMENT 'the id of the items in the many table',
  `one_id` INT UNSIGNED  NOT NULL
           COMMENT 'The ID of the item in the one table that this many item belongs to' ,
  ... other data
)

Making sure, of course, to create an index on the one_id in both tables.

RHSeeger
Lookups are going to be (by far) the most common operation. Updates will be more frequent than additions. I should edit the question - I did a very poor job of posing it.
Chris
Note: Unsigned INT type in MySQL only supports max of 4,294,967,295 values. Based on the posters comments that might not be enough for the 'many_table' ID record. If he needs more than 4 billion records he should go with unsigned BIGINT.Otherwise this is what I would recommend as well. Indexes are extremely important though.
Dennis Baker
@Dennis Baker - Edited to reflect your comment, as you have a good point. My bad for copy/pasting from my own code because it was faster ;)
RHSeeger
A: 

Not so much an answer but a few questions and a possible approach....

If you want to make the table self referencing and only use one field ... there are some options. A calculated maskable 'join' field describes a way to associate many rows with each other.

The best solution will probably consider the nature of the data and relationships? What is the nature of the data and lookups? What sort of relationship are you trying to contain? Association? Related? Parent/Children?

CMB
+1  A: 

If you need to do deep/recursive traversals into the data, a graph database like Neo4j (where I'm on the team) is a good choice. You'll find some information in the article Should you go Beyond Relational Databases? and in this post at High Scalability. For a use case that may be similar to yours, read this thread on MetaFilter. For information on language bindings and other things you may also find the Neo4j wiki and mailing list useful.

nawroth