ansaurus

Question

Programmatically extracting relationships between tables in an RDBMS w/out foreign keys?

Answer 1

+1 A:

You pretty much wrote the answer in your question.

my %column_tables;
foreach my $table (@tables) {
    foreach my $column ($table->columns) {
        push @{$column_tables[$column]}, $table;
    }
}
print "Likely foreign key relationships:\n";
foreach my $column (keys %column_tables) {
    my @tables = @{$column_tables[$column]};
    next
        if @tables < 2;
    print $column, ': ';
    foreach my $table (@tables) {
        print $table->name, ' ';
    }
    print "\n";
}

chaos 2009-02-27 20:00:21

Of course, this would work as long as all columns which are logically linked have the same name. Otherwise, you'd have to do an extensive analysis of the data, to try to get the relationships... a tough task I wouldn't want, indeed.

Joe Pineda 2009-02-28 05:42:49

Answer 2

+1 A:

You can use a combination of three (or four) approaches, depending on how obfuscated the schema is:

dynamic methods
- observation:
  - enable tracing in the RDBMS (or ODBC layer), then
  - perform various activities in the application (ideally record creation), then
  - identify which tables were altered in tight sequence, and with what column-value pairs
  - values occurring in more than one column during the sequence interval may indicate a foreign key relationship
static methods (just analyzing existing data, no need to have a running application)
- nomenclature: try to infer relationships from column names
- statistical: look at minimum/maximum (and possibly the average) of unique values in all numerical columns, and attempt to perform a match
- code reverse engineering: your last resort (unless dealing with scripts) - not for the faint of heart :)

Cheers, V.

vladr 2009-02-27 23:19:53

Answer 3

A:

My strategy would be to use the Oracle system catalog to find columns that are the same in column name and data type but different in table name. Also which one of the columns is part of a table's primary or unique key.

Here's a query that may be close to doing this, but I don't have an Oracle instance handy to test it:

SELECT col1.table_name || '.' || col1.column_name || ' -> ' 
    || col2.table_name || '.' || col2.column_name
FROM all_tab_columns col1 
  JOIN all_tab_columns col2
    ON (col1.column_name = col2.column_name 
    AND col1.data_type = col2.data_type)
  JOIN all_cons_columns cc
    ON (col2.table_name = cc.table_name 
    AND col2.column_name = cc.column_name)
  JOIN all_constraints con
    ON (cc.constraint_name = con.constraint_name 
    AND cc.table_name = con.table_name 
    AND con.constraint_type IN ('P', 'U')
WHERE col1.table_name != col2.table_name;

Of course this won't get any case of columns that are related but have different names.

Bill Karwin 2009-02-28 01:55:15

Answer 4

A:

This is an interesting question. The approach I took was a brute force search for columns that matched types and values for a small sample set. You'll probably have to tweak the heuristics to provide good results for your schema. I ran this on a schema that didn't use auto-incremented keys and it worked well. The code is written for MySQL, but it's very easy to adapt to Oracle.

use strict;
use warnings;
use DBI;

my $dbh = DBI->connect("dbi:mysql:host=localhost;database=SCHEMA", "USER", "PASS");

my @list;
foreach my $table (show_tables()) {
    foreach my $column (show_columns($table)) {
        push @list, { table => $table, column => $column };
    }
}

foreach my $m (@list) {
    my @match;
    foreach my $f (@list) {
        if (($m->{table} ne $f->{table}) &&
            ($m->{column}{type} eq $f->{column}{type}) &&
            (samples_found($m->{table}, $m->{column}{name}, $f->{column}{samples})))
        {
            # For better confidence, add other heuristics such as
            # joining the tables and verifying that every value
            # appears in the master. Also it may be useful to exclude
            # columns in large tables without an index although that
            # heuristic may fail for composite keys.
            #
            # Heuristics such as columns having the same name are too
            # brittle for many of the schemas I've worked with. It may
            # be too much to even require identical types.

            push @match, "$f->{table}.$f->{column}{name}";
        }
    }
    if (@match) {
        print "$m->{table}.$m->{column}{name} $m->{column}{type} <-- @match\n";
    }
}

$dbh->disconnect();

exit;

sub show_tables {
    my $result = query("show tables");
    return ($result) ? @$result : ();
}

sub show_columns {
    my ($table) = @_;
    my $result = query("desc $table");
    my @columns;
    if ($result) {
        @columns = map {
            { name => $_->[0],
              type => $_->[1],
              samples => query("select distinct $_->[0] from $table limit 10") }
        } @$result;
    }
    return @columns;
}

sub samples_found {
    my ($table, $column, $samples) = @_;
    foreach my $v (@$samples) {
        my $result = query("select count(1) from $table where $column=?", $v);
        if (!$result || $result->[0] == 0) {
            return 0;
        }
    }
    return 1;
}

sub query {
    my ($sql, @binding) = @_;
    my $result = $dbh->selectall_arrayref($sql, undef, @binding);
    if ($result && $result->[0] && @{$result->[0]} == 1) {
        foreach my $row (@$result) {
            $row = $row->[0];
        }
    }
    return $result;
}

Ken Fox 2009-02-28 05:03:19

ansaurus

tags:

views:

answers:

Programmatically extracting relationships between tables in an RDBMS w/out foreign keys?

related questions