ansaurus

Question

Calculate Family Relationship from Genealogical Data

Answer 1

A:

This might help you, it's a lot of theory and implementation of SQL queries to generate and query tree structures

http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html

In particular, look at the adjacency list model which uses a family tree as example.

Charles Ma 2009-06-30 13:44:41

Thanks for the link, but I already have implemented most of what is demonstrated on that page. I need to calculate family relationships, which is considerably more complex than those examples.

Dustin Fineout 2009-06-30 15:23:44

Answer 2

+4 A:

You'll first need to calculate the Lowest Common Ancestor of both A and B. Call this Lowest Common Ancestor C.

Then calculate the distance in steps from C to A (CA) and C to B (CB). These values should be indexed into another table that determines the relationship based on these two values. For example:

CA      CB      Relation
1       2       uncle
2       1       nephew
2       2       cousin
0       1       father
0       2       grandfather

You might keep the basic relations in this table, and add "great-" for additional distances on certain relations like grandfather, ex.: (0, 3) = great-grandfather.

Hopefully this will point you in the right direction. Best of luck!

UPDATE: (I can't comment below your code, since I don't have the reputation yet.)

Your function aggrandize_relationships is a little off, I think. You can simplify it by prefixing "grand" if the offset is 1 or greater, then prefix "great-" (offset - 1) times. Your version might include the prefix "great grand great grand" for very distant relatives.(Not sure if I have the correct parameter in this explanation, but hopefully you get the gist of it. Also, no idea if your family tree is going that far back, but the point remains valid.)

UPDATE TOO: Sorry, the above is incorrect. I misread the default case, and thought it recursively called the function again. In my defense, I wasn't familiar with the "2nd great grandfather" notation, and always used "great great grandfather" myself. Code onward!!

taserian 2009-07-02 16:29:12

Awesome, thanks for the nudge, I'll see where it leads me!

Dustin Fineout 2009-07-02 16:36:50

This has lead me to what I believe to be the solution to the problem. It actually is *slightly* more complicated, with respect to canonical versus civil generations and the resulting 1st/2nd/etc cousins being 1/2/etc. times removed. Your link lead me to some further reading and I believe I have all the information needed now to build an algorithm and implement it.

Dustin Fineout 2009-07-02 16:51:41

You might need to not stop at the first lowest common ancestor found. For instance, do you want to distinguish between full and half siblings? Or between, say, normal first cousins and double first cousins (where two brothers marry two sisters and both pairs have children who would have all grandparents in common). Think about making your implementation bullet-proof against incest, too - which unfortunately occurs - like, if a father and grandfather are the same for a person, you do not want to overwrite that in a lookup table.

Anon 2009-07-02 19:59:43

@Anon Definitely a problem that had crossed my mind, but I think that will end up as a second question to revise/enhance my implementation once I've completed it. Thanks!

Dustin Fineout 2009-07-06 14:20:26

Thanks for the updates :) I enjoy the ordinal suffix myself, and while I do have a soft spot for redundancy, I hate counting/saying 'great great great...'. Currently the longest proven direct line in my family tree goes back 16 generations. I don't need 13 greats to count :-p

Dustin Fineout 2009-07-07 18:30:10

Answer 3

+2 A:

Below is my PHP implementation of my algorithm to calculate relationship. This is based upon the data schema I outlined in the original question. This only finds the "closest" i.e. shortest-path relationship between the two individuals, it does not resolve compound relationships such as half-siblings or double cousins.

Note that data access functions such as get_father and get_gender are written in the style of a database abstraction layer I always use. It should be fairly straightforward to understand what is going on, basically all dbms-specific functions such as mysql_query are replaced with generalized functions such as db_query; it is not very complicated at all, especially in the examples in this code, but feel free to post questions in comments if it isn't clear.

<?php
/* Calculate relationship "a is the ___ of b" */

define("GENDER_MALE", 1);
define("GENDER_FEMALE", 2);

function calculate_relationship($a_id, $b_id)
{
  if ($a_id == $b_id) {
    return 'self';
  }

  $lca = lowest_common_ancestor($a_id, $b_id);
  if (!$lca) {
    return false;
  }
  $a_dist = $lca[1];
  $b_dist = $lca[2];

  $a_gen = get_gender($a_id);

  // DIRECT DESCENDANT - PARENT
  if ($a_dist == 0) {
    $rel = $a_gen == GENDER_MALE ? 'father' : 'mother';
    return aggrandize_relationship($rel, $b_dist);
  }
  // DIRECT DESCENDANT - CHILD
  if ($b_dist == 0) {
    $rel = $a_gen == GENDER_MALE ? 'son' : 'daughter';
    return aggrandize_relationship($rel, $a_dist);
  }

  // EQUAL DISTANCE - SIBLINGS / PERFECT COUSINS
  if ($a_dist == $b_dist) {
    switch ($a_dist) {
      case 1:
        return $a_gen == GENDER_MALE ? 'brother' : 'sister';
        break;
      case 2:
        return 'cousin';
        break;
      default:
        return ordinal_suffix($a_dist - 2).' cousin';
    }
  }

  // AUNT / UNCLE
  if ($a_dist == 1) {
    $rel = $a_gen == GENDER_MALE ? 'uncle' : 'aunt';
    return aggrandize_relationship($rel, $b_dist, 1);
  }
  // NEPHEW / NIECE
  if ($b_dist == 1) {
    $rel = $a_gen == GENDER_MALE ? 'nephew' : 'niece';
    return aggrandize_relationship($rel, $a_dist, 1);
  }

  // COUSINS, GENERATIONALLY REMOVED
  $cous_ord = min($a_dist, $b_dist) - 1;
  $cous_gen = abs($a_dist - $b_dist);
  return ordinal_suffix($cous_ord).' cousin '.format_plural($cous_gen, 'time', 'times').' removed';
} //END function calculate_relationship

function aggrandize_relationship($rel, $dist, $offset = 0) {
  $dist -= $offset;
  switch ($dist) {
    case 1:
      return $rel;
      break;
    case 2:
      return 'grand'.$rel;
      break;
    case 3:
      return 'great grand'.$rel;
      break;
    default:
      return ordinal_suffix($dist - 2).' great grand'.$rel;
  }
} //END function aggrandize_relationship

function lowest_common_ancestor($a_id, $b_id)
{
  $common_ancestors = common_ancestors($a_id, $b_id);

  $least_distance = -1;
  $ld_index = -1;

  foreach ($common_ancestors as $i => $c_anc) {
    $distance = $c_anc[1] + $c_anc[2];
    if ($least_distance < 0 || $least_distance > $distance) {
      $least_distance = $distance;
      $ld_index = $i;
    }
  }

  return $ld_index >= 0 ? $common_ancestors[$ld_index] : false;
} //END function lowest_common_ancestor

function common_ancestors($a_id, $b_id) {
  $common_ancestors = array();

  $a_ancestors = get_ancestors($a_id);
  $b_ancestors = get_ancestors($b_id);

  foreach ($a_ancestors as $a_anc) {
    foreach ($b_ancestors as $b_anc) {
      if ($a_anc[0] == $b_anc[0]) {
        $common_ancestors[] = array($a_anc[0], $a_anc[1], $b_anc[1]);
        break 1;
      }
    }
  }

  return $common_ancestors;
} //END function common_ancestors

function get_ancestors($id, $dist = 0)
{
  $ancestors = array();

  // SELF
  $ancestors[] = array($id, $dist);

  // PARENTS
  $parents = get_parents($id);
  foreach ($parents as $par) {
    if ($par != 0) {
      $par_ancestors = get_ancestors($par, $dist + 1);
      foreach ($par_ancestors as $par_anc) {
        $ancestors[] = $par_anc;
      }
    }
  }

  return $ancestors;
} //END function get_ancestors

function get_parents($id)
{
  return array(get_father($id), get_mother($id));
} //END function get_parents

function get_father($id)
{
  $res = db_result(db_query("SELECT father_id FROM child WHERE child_id = %s", $id));
  return $res ? $res : 0;
} //END function get_father

function get_mother($id)
{
  $res = db_result(db_query("SELECT mother_id FROM child WHERE child_id = %s", $id));
  return $res ? $res : 0;
} //END function get_mother

function get_gender($id)
{
  return intval(db_result(db_query("SELECT gender FROM individual WHERE id = %s", $id)));
}

function ordinal_suffix($number, $super = false)
{
  if ($number % 100 > 10 && $number %100 < 14) {
    $os = 'th';
  } else if ($number == 0) {
    $os = '';
  } else {
    $last = substr($number, -1, 1);

    switch($last) {
      case "1":
        $os = 'st';
        break;
      case "2":
        $os = 'nd';
        break;
      case "3":
        $os = 'rd';
        break;
      default:
        $os = 'th';
    }
  }

  $os = $super ? '<sup>'.$os.'</sup>' : $os;

  return $number.$os;
} //END function ordinal_suffix

function format_plural($count, $singular, $plural)
{
  return $count.' '.($count == 1 || $count == -1 ? $singular : $plural);
} //END function plural_format

?>

As I had mentioned previously, the algorithm to determine LCA is far less than optimal. I plan to post a separate question to optimize that, and another to address the problem of calculating compound relationships such as double cousins.

Many thanks to everyone who helped prod me in the right direction! With your tips, this turned out to be much easier than I originally thought.

Dustin Fineout 2009-07-06 14:41:29

I will leave this open without accepting an answer for at least 2 days to allow for further discussion, pointing out any silly mistakes I made, suggestions for improvement, etc.

Dustin Fineout 2009-07-06 15:05:39

Answer 4

A:

wuub 2009-07-06 16:37:19

I actually had taken a look at this "solution" months ago, but in fact this is a very lacking algorithm, able to compute none but the very simplest relationships (sibling, parent, child, uncle). Its method for solving relationships is also quite kludge, rather than computing the relationship, it runs hard-checks for every possible relationship. I need a much more robust solution than that.

Dustin Fineout 2009-07-06 16:50:57

I don't think that accusing people of stealing is a good strategy to get help. I've coded a basic prolog example used in almost every book/tutorial ever created, it's like accusing someone of stealing bubble sort. Just to let you know, Prolog is perfectly capable of expressing insanely complicated relationships, and there are ways to compute relationship databases more efficiently than presented naive solution.

wuub 2009-07-06 17:31:23

@Wuub my apology if that is the case - I am not well versed in Prolog but did find that exact example in only one place (and I searched for other examples, no luck). My solution is naive, admittedly, but it is much more optimal than the example you presented, both in optimal running time and the robustness of the algorithm. Please, you don't need to take these things seriously. It's programming, we're all learning here and (hopefully) always will be.

Dustin Fineout 2009-07-06 17:34:24

-seriously +personally is what I meant

Dustin Fineout 2009-07-06 17:35:18

Also, just to clear up any possible confusion, I was addressing the presented algorithm, not PROLOG itself, which actually seems very well suited for the problem at hand as it is engineered specifically to process complex relationships.

Dustin Fineout 2009-07-06 17:38:07

Again, my heartfelt apology if you wrote that - I definitely do not mean to accuse you of plagiarism if it is your own work, and I greatly appreciate your time and input. I understand being such a short piece of code and being an obvious candidate for example code that it may be just as common as you say, and again admit that I have almost zero exposure to PROLOG which contributed to my jumping to such a conclusion immediately. Please understand, I take plagiarism very seriously; because of this, I feel it's almost worse to falsely accuse someone of it as it would be to commit it. My apology.

Dustin Fineout 2009-07-06 17:53:47

No problem. I wrote each and every byte of the above code myself, but never said it was my original idea.I agree that there are much better algorithms for computing requested output (even in Prolog). I agree that Prolog is not an acceptable solution for web development. But I think it's worth mentioning nonetheless.

wuub 2009-07-06 18:10:22

ansaurus

tags:

views:

answers:

Calculate Family Relationship from Genealogical Data

related questions