ansaurus

Question

Answer 1

+1 A:

Space is cheap ... there is no need to delete data in today's world. However, that doesn't mean we can't be efficient. If I was coming at this as a database design problem then I would create two tables.

One for products

and

One for compatibility, which you have done.

But in the example above you do not give a reason why you are tracking non-compatibility. If the products are in the compatibility table then that means the are compatible ... if they are not then well that means they are not compatible.

How are you populating these rows ... you never give a reason why you add a row for A to C, but then you also add a row for C to A. Why add the second row at all?

In your table what is the exact data in the table columns for product A and B? product id? product name?

Ryan 2010-07-26 14:52:53

When creating a compatibility table you still would get doubles; even with a unique index on the primary key it would still be posible to create (A,C) and (C,A) pairs.In that case the solution would be to create a trigger before_insert to validate that the pair also is unique when tried the other way around

Mark 2010-07-26 14:55:55

I have only populated the first two columns using a for loop, I have not populated the third column at all yet. There are a lot more products then the example I tried making, I simplified it down a bit

thomas 2010-07-26 15:11:20

The products still go in the compatibility table when they aren't compatible because it is not simply a binary yes/no. There are some that are "still testing".

thomas 2010-07-26 15:13:25

That makes more sense then. Like Mark said you will have to create some sort of validation on before you save to the database to determine if there is redundant data. What kind of application is this? Is it web-based or a desktop application?To follow that, what language are you using to populate the database or write your application in?

Ryan 2010-07-26 18:13:03

Answer 2

A:

It is almost always worthwhile to properly normalize your data. Lower volume of data and no chance of inconsistency are just the two most obvious reasons why. So if your compatibility is in fact guaranteed to be symmetric, and remain symmetric forever (and not some kind of upward- vs. backward-compatibility...), then yes, you should delete the redundant rows.

The only caveat is that in the future you must either query the compatibility in the canonical order (with the lower product, however you define that, in the first slot of your query), or use a disjunctive query, otherwise you might miss a legitimate combination. (The first of those options is obviously the better solution, since the second reintroduces unnecessary processing effort.)

Kilian Foth 2010-07-26 14:53:30

What if the user were to input which two products they wanted to check. I have no control over if they choose ' "A" and "B" ' rather than ' "B" and "A" '

thomas 2010-07-26 15:18:48

Then your business code needs to sort the two products before firing off the query. I mean, it's not like you're generating database queries directly from your TextInput widgets, right? <sound of chirping crickets>Er... right?

Kilian Foth 2010-07-26 15:33:31

I haven't made it yet (so no, it is not generating straight from the textinput), but that's a good point.Thanks, I'm fairly new to mySQL and just trying to make sure everything I do is kosher

thomas 2010-07-26 15:34:57

Is there any way to automate this process?

thomas 2010-07-26 15:40:39

ansaurus

tags:

views:

answers:

No duplicate rows, but redundant data

related questions