tags:

views:

737

answers:

4

I have a log file stored in an SQLite database which I would like to distribute in a git repository.

Later on I would like changes to the log file to be merge automatically with the original.

Is this going to work? Will an automatic binary merge into an SQLite file blow up more often than not?

+2  A: 

I'm not convinced that git is really the tool for your job. git is a distributed source code management tool, not a database replication tool.

The only automatic merging that git will attempt is merging of text files. A log file is (usually) a text file, so why not put this directly into git and not into a database first?

Charles Bailey
Thanks for the suggestion.
git-noob
+1  A: 

I doubt any generic version control system (git, svn, cvs, etc.) can handle database in the ways you described. If you insist on using git for merging databases, your best bet is to convert the database into text file, merge the text file, and re-create the database. For example,

sqlite3 .dump > dump_file.txt

can create all sql statements necessary to re-make the database, then you do stuff to the dumped file, then make a sqlite database with

sqlite3 newdatabase.db < modified_dump_file.txt

You should be able to automate this using some sort of git hook (I'm not too familiar with git).

polyglot
A: 

There is no way to merge binary files correctly this in the general case, so git cannot and will not do it.

With some effort, you could use git to version database dumps, but except for very simple cases you’ll have to do more than just use straight dumps. You’ll need to think about how the dumped rows are sorted based on your key columns, at the very least. Else you’ll get spurious conflicts, or merges that produce syntactically valid dumps representing a garbage database.

F.ex., if different versions of a row with the same key show up in different line regions of different versions of the dump, git might think it reasonable to keep them both. The resulting dump would have two representations of the same row, which is nonsense.

In short, you’ll probably be unhappy trying to keep a database versioned using a source control system.

Aristotle Pagaltzis
A: 

You need to define custom merge and diff drivers in your git config, and then use attributes to associate them with the files.

This just does a simple text merge on the dumps, so it could very well produce total nonsense. You will absolutely need to check its work to make sure it did the right thing It should take the tedium out of the easy merges though.

In your .git/config:

[merge "sqlite3"]
    name = sqlite3 merge driver
    driver = merge-sqlite3 %O %A %B

[diff "sqlite3"]   
    name = sqlite3 diff driver  
    command = diff-sqlite3 

in .gitattributes:

signons.sqlite diff=sqlite3 merge=sqlite3

And somewhere in your path, named diff-sqlite3

#!/usr/bin/perl -w

use File::Temp  qw/ :POSIX /;
use IPC::Run qw/run/ ;

@ARGV == 7 or die sprintf 'wtf %s', join(' ', @ARGV);

my ($name, $x, $y) = ($ARGV[0], $ARGV[1], $ARGV[4]);

my ($a, $b);

eval { 
  $a = tmpnam();
  $b = tmpnam();

  run ['sqlite3', $x, '.dump'], '>', $a or die 'sqlite3 failed';
  run ['sqlite3', $y, '.dump'], '>', $b or die 'sqlite3 failed';

  print "diff-sqlite3 a/$name b/$name\n";
  run ['diff', '-u', $a, $b, '--label', "a/$name", '--label', "b/$name"], '>', \*STDOUT;   

  unlink $a;
  unlink $b; 
  1;
} or do {
  unlink $a if defined $a;
  unlink $b if defined $b;
  die $@; 
}

also in your path, named merge-sqlite3

#!/usr/bin/perl -w

use File::Temp  qw/ :POSIX /;
use IPC::Run qw/run/ ;

@ARGV == 3 or die sprintf 'wtf %s', join(' ', @ARGV);

my ($o, $a, $b) = @ARGV; 

print "MERGEING SQLITE FILES $o $a $b\n"; 


eval { 
  $ad = tmpnam();
  $bd = tmpnam();
  $od = tmpnam(); 

  run ['sqlite3', $o, '.dump'], '>', $od or die 'sqlite3 failed';
  run ['sqlite3', $a, '.dump'], '>', $ad or die 'sqlite3 failed';
  run ['sqlite3', $b, '.dump'], '>', $bd or die 'sqlite3 failed';

  run ['merge', $ad, $od, $bd] or do {
    my $newname = "$a.dump";
    my $n = 0;
    while (-e $newname) {
      ++$n;
      $newname = "$a.dump.$n";
    }
    print "merge failed, saving dump in $newname\n";
    rename $ad, $newname;
    undef $ad; 
    die 'merge failed';
  };

  unlink $a or die $!;
  my $err; 
  run ['sqlite3', $a], '>', \*STDOUT, '2>', \$err, '<', $ad;
  if ('' ne $err) {
    print STDERR $err;
    die 'sqlite3 failed';
  }  

  unlink $ad if defined $ad;
  unlink $bd; 
  unlink $od;
  1;
} or do {
  unlink $ad if defined $ad;
  unlink $bd if defined $bd;
  unlink $od if defined $od;

  die $@; 
}

I just hacked these up right now, now so you may have to iron out the kinks.

see: http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html and http://www.kernel.org/pub/software/scm/git/docs/git-config.html

smoofra