ansaurus

Question

Example of a database table design with a questionable repeating group

Answer 1

+1 A:

What are Item1 and Item2? Are they distinct entities? Then the design seems fine to me.

For example, you might want to fill a database with solutions to the traveling salesman problem. You have a table City(cityId, latitude, longitude), and a table Path(pathId, salesmanId). Now a path where the salesman visits n+1 cities would be represented by n entries in PathSegment(pathId, segmentId, fromCityId, toCityId). Here, although fromCityId and toCityId are foreign keys that reference the same table City, they describe different attributes of the PathSegment entity, hence this does not violate NF1.

Edit:

So you want to store trees, actually, only your trees are mostly just linked lists, and most of those are linked lists with just two nodes, right? And apparently your coworker wants to do this as an adjacency list, so a tree like

1-2-3
\-4

becomes

(1,2)
(2,3)
(1,4)

There's nothing wrong with that, but it's not the only way to store a tree in a database. For a good summary of alternatives, see here.

In your case, the advantage of using an adjacency list is that most of your trees have only two nodes, so most of them end up being one row in the table, keeping that simple. Also, questions about the immediate neighbours are easy. "What's the invoice for this payment?" becomes

select item1 from link where item2 = :paymentID

which is neat, too. There are drawbacks, though. The order of child nodes often matters, and the list doesn't help you here, so you have to store that either as a separate column or as something like timestamps in the tables your foreign keys are referring to). Also, reconstructing an entire branch becomes a recursive task, and not all database systems can do that. So if your application often has to retrieve a message-board-like overview of the invoice history, it might require some application-side logic that turns the list of adjacent nodes into a tree on the client and works on that. If that becomes too cumbersome, you might want to consider a nested sets representation, see here.

What's best for your problem? Depends on several things: size and shape of your trees (if they are really mostly short linked lists, adjacency list is good), frequency of inserts and updates (if frequent, adjacency list is good, because its inserts are cheap), frequency and complexity of queries (if frequent and complex, nested sets is good, because its selects are simple and fast). So for a message board, i'd go with nested sets (or even Tropashkos nested intervals for speed and extra coolness), but for a simple request-response (and sometimes some more response) table, i'd probably use an adjacency list.

wallenborn 2009-12-01 15:22:59

Ok. In my example, Item1 and Item2 could have been switch. It would make no difference to the resulting group. Would that make a difference?

cjrh 2009-12-02 05:42:00

Probably yes. I would consider a design change under a few circumstances. A few are: 1. if there is an attribute that depends only on both items. In the traveling salesman example this could be the distance between fromCity and toCity, 2. if the number of Items might change in the future. A PathSegment has always exaclty two endpoints and no change request can ever change that, but ymmv, and 3. if the Items are interchangeable (thats what you suggest), because then SELECTs tend to be hard: `select * from LINK where item1 in (4,7) and item2 in (4,7) and item1<>item2` is awkward.

wallenborn 2009-12-02 12:30:49

Let's also discuss your (2): the (Item1, Item2) scheme above is being used to group, at least some of the time, >2 times. So there will never be more than 2 fields required, but at the same time, there are a variable number of items in a "group". A group, in this schema, is defined as a complete connected set. Perhaps something like a linked list. Would that also be problematic for you?

cjrh 2009-12-03 06:47:51

Sorry, >2 *items*

cjrh 2009-12-03 06:48:21

Response to your edit: thanks for the discussion, that was interesting. You haven't convinced me to change my position, but I appreciate the discussion.

cjrh 2009-12-07 05:56:26

Answer 2

+1 A:

Dewayne Christensen 2009-12-01 16:01:20

Using your example, what about Relation1, Relation2 (instead of FatherID, MotherID)? Does that still satisfy 1NF?

cjrh 2009-12-02 05:45:35

I take your point about same table referral, however. In the situation you have described, will it be implicit that FatherID and MotherID refer to a Person table? Several naming conventions prefer to indicate the table to which a foreign key refers?

cjrh 2009-12-02 05:50:16

Normalization and naming are completely separate issues. You can name all your fields Field1, Field2, Field3, and still be in Nth-normal form.In the Person example, father and mother are semantically different things, therefore not a repeating group (as long as we stick to blood relatives). If we add siblings into the mix, however, then the story changes. Having fields Sibling1ID, Sibling2ID, ... constitutes a repeating group.Naming conventions are up to you and your team. If you'd rather use FatherPersonID and MotherPersonID that's your choice, but that doesn't affect normalization.

Dewayne Christensen 2009-12-02 14:36:28

Yes, I asked the two questions separately for a reason. The important question was the first one. We could go further: would you enumerate aunts, uncles, cousins, nephews, and so on, as fields or as entries all in the same column, but with a "relation type" specifier?

cjrh 2009-12-03 06:36:51

As soon as you strike something that you have multiples of, then you have a repeating group. So for aunts and uncles (and bears, oh my! :) ) you'd want a separate join table. Mother and father would depend on how they're being used. If it's a medical application and you only care about the birth parents, then two fields within the Person table would suffice; if it's an insurance application and you need to know "who's my daddy today" then you'd need to move mom and pop out to a separate table, probably with a date range.

Dewayne Christensen 2009-12-03 14:41:10

It is often difficult to know in advance the kind of enhancements that will be required. It is a real pita to have to change schema for enhancement requests. IMO wallenborn's example of a "start" and "end" point are a good example of by-field enumeration, because they can never change, and every line segments requires both, and only those two at all times. I am a little more uncomfortable with roles played by people. This may, however, be paranoia.

cjrh 2009-12-07 05:50:59

ansaurus

tags:

views:

answers:

Example of a database table design with a questionable repeating group

related questions