tags:

views:

37

answers:

2

Hello I have a very large SQL script and while trying to execute it I get an error from PostgreSQL about a duplicate primary key. It does not give a line number of anything for where the duplicate entry occurs (it is also wrapped in a transaction, so it doesn't know about the duplicate until commit; at the end of the file.

Basically, could someone help me to write a quick Ruby script to parse this file and find the duplicate entry? This file takes like a minute to load in most editors and it took like 5-10 minutes to do find-replace with it.. so it's quite massive and the only way I can think of to do it with Ruby is too complex. It would require two for loops and must parse the file once in each loop...

The lines are in this format..

INSERT INTO ZIPCODE (ZIPCODE, CITY, STATE, STATECODE, COUNTY, COUNTYCODE) VALUES (N'00782', N'COMERIO', N'PR', N'72', N'COMERIO', N'045');

The first (N'xxxxx',... is the primary key.. What would be the best way of parsing this file that won't take all day to run? Also, this only has to be done once.. so a throw away script is fine..

A: 

Assuming that Perl is acceptable:

#!/usr/bin/perl

use strict;

my %zips;
my $lineNum = 0;
while (<STDIN>)
{
    chomp;
    $lineNum++;
    if(/INSERT INTO ZIPCODE \(ZIPCODE, CITY, STATE, STATECODE, COUNTY, COUNTYCODE\) VALUES \(N'(\d{5})'/)
    {
        if($zips{$1})
        {
            print "Found duplicate zipcode at $lineNum (first instance $zips{$1}): $1";
        }
        else
        {
            $zips{$1} = $lineNum;
        }
    }
}

This is untested.

Dancrumb
Few minor changes, but that worked.. so Perl is close enough....
Earlz
I'm interested... a few minor changes to port to Ruby? Or you have to change the script to make it work?
Dancrumb
+1  A: 

Here's a Ruby version:

#!/usr/bin/env ruby

seen = {}
line_num = 0
$stdin.each_line do |line|
  line_num += 1
  next unless line.start_with?('INSERT INTO ZIPCODE')
  zip = line.scan(/\d{5}/).first
  if seen[zip]
    puts "Line #{line_num} is a duplicate of line #{seen[zip]}"
  else
    seen[zip] = line_num
  end
end
Lars Haugseth
It's why I love Ruby and will move from doing Web project on Java to RoR. +1. :)
dimitko