ansaurus

Question

How to store bidirectional relationships

Answer 1

+1 A:

As always it depends on what you want to do with the data once you've calculated it.

Assuming it's simply to identify or locate duplicates then your suggestion 1 is what I'd use, i.e. a second table that simply stores the pairs and the strengths. My only suggestion is to make the strengths a scaled integer rather than a decimal.

Richard Harrison 2010-09-17 08:35:45

I need to present the data back to the people who maintain it so they can go through and check it. So in that respect my first suggestion would suffice I suppose. But I wanted to know if there was a 'standard' way of storing such information so I could have the flexibilty to output it into various formats depending on what they wanted (as they will no doubt come back saying they want it done another way!). Also ... it's a good oportunity to improve my understanding of database schema.

alj 2010-09-17 08:42:27

... and thanks Richard.

alj 2010-09-17 10:56:38

It's the way I've always done it. Sometimes the simplest solution just works and we need not seek anything more complex. The first solution will work and will be sufficiently efficient and produce the results you need.

Richard Harrison 2010-09-17 21:17:52

Answer 2

+3 A:

What we have here is a graph in which each node has a relationship (edit distance) with every other node. This is not in the normal range of data models. It is also not a permanent feature of your database (assuming you resolve the business processes which led to the duplicate data) so it isn't worth sweating over the solution which best fits relational theory. What we need is a a practical solution.

Think of it as a matrix. If we go for the optimum processing we won't execute the duplicate scorings. So we score Address 1 against all the other Addresses, we score Address 2 against all the other Addresses except Address 1, we score Address 3 against all the other Addresses except Addresses 1 and 2, etc. And what we end up with is a bit like a football league table:

          addr  
          1    2     3    4     5
addr
  1       -   95    95   80    76 
  2       -    -   100   75    72
  3       -    -     -   75    72
  4       -    -     -    -    83
  5       -    -     -    -     -

This data can best be stored in suggestion 1, a table of ID1, ID2, SCORE. Although we do need to pivot the data to get the output looking like that :)

In a proper league table there are two sets of scores - Home and Away - so the table is symmetrical. But that doesn't apply here, as the edit distance for 1 > 2 is the same as 2 > 1. However, it would make querying the results more straightforward if the result set included the mirrored scores. That is, for records (1,5,76), (2,5,72), etc we generate records (5,1,76), (5,2,72). This could be done at the end of the scoring process.

          addr  
          1    2     3    4     5
addr
  1       -   95    95   80    76 
  2      95    -   100   75    72
  3      95  100     -   75    72
  4      80   75    75    -    83
  5      76   72    72   83     -

Of course, this is mainly a presentational thing, so it only needs to be done for display purposes, e.g. exporting the data to a spreadsheet. We can still get all the scores for, say, Address 5 in a readable fashion without miiroring the scores using a simple SQL statement:

select case when id1 = 5 then id1 else id2 end as id1
       , case when id1 = 5 then id2 else id1 end as id2 
       , score
from   your_table
where  id1 = 5 
or     id2 = 5
/

APC 2010-09-17 10:35:25

Thanks APC. That matrix makes sense and helps to visualise it. That SQL statement is really hand too. Thanks.

alj 2010-09-17 10:55:26

Answer 3

A:

The way to deal with symmetric relations in a relational system is as follows:

choose a canonical form in which the symmetric pairs are stored, e.g. customer_id1 < customer_id2.
Define a view SYMM_TBL as select id1,id2,... from ... UNION select id2 as id1,id1 as id2, ... FROM ...

Decent systems ought not punish you in the performance area when querying this view.

Erwin Smout 2010-09-17 14:51:41

ansaurus

tags:

views:

answers:

How to store bidirectional relationships

related questions