tags:

views:

70

answers:

3

I am reading a file which contains a record in each line. I am extracting the contents of the file and inserting it as column values into a table. The problem I face is, suppose if i insert a record into a table after reading from the file, I want to remove the duplicate fields. For example:

NAME age time
Tom   21  10:30
Tom   21  12:21

After insertion into the table I want it to be:

NAME AGE TIME
tom   21  10:30
          12:21

It should eliminate the duplicates. If I add a unique condition while creating the table I face a problem that the distinct time field is not being inserted and results in an error in MySQL.

So how can I do this? I want some suggestions.

for my $test11 (sort keys %seen) {
    my $test1 = $seen{$test11}{'name'};
    my $test2 = $seen{$test11}{'pid'};
    my $test3 = $seen{$test11}{'type'};
    my $test4 = $seen{$test11}{'time1'};
    print "$test11\t$test1$test2$test3$test4\n";
}

#sub query_execute()
{
    $db_handle = &getdb_handle;
    $sth       = $dbh->prepare("INSERT INTO tahle_new values('$sno','$id','$test1','$test4','$test2','$test3')");

$test1 and $test2 contain duplicates but not $test3.

+2  A: 

Each row has to have some value for each column. Do you want what would otherwise be duplicates to be NULL? It's hard to imagine a practical use for such a table.

One way is to have another table for each column you want to have non-duplicated with that column as a unique key; before adding the row to your main table, try to add the column values to the per column tables; if there is a duplicate error, clear that value before trying to add to the main table.

ysth
Note that this is not the most efficient way to detect dups, but is simple to do and should protect against race conditions if done correctly.
ysth
+3  A: 

Another way, is to define a unique key on 2 columns. Your key will be (Name, Age) and unique. So on insert you'll get an error, or add to your request : '... ON DUPLICATE KEY...' and do something (or do nothing ;) )

And as ysth said, i would advice not to insret your line with null values as the second one (null,null, 12:21)

benzebuth
+1  A: 

ysth is correct, you should not have null values. The duplication indicates that you need two tables. One for the log and one to store information about the user.

CREATE TABLE user (
    id     INTEGER   NOT NULL PRIMARY KEY AUTO_INCREMENT,
    name   TEXT      NOT NULL,
    age    INTEGER
);

CREATE TABLE log (
    user   INTEGER REFERENCES user,
    time   TIME
);

The log table stores the integer id of the user and the time (and whatever else). You'd insert an entry with:

$dbh->do("INSERT INTO log VALUES (?,?)", undef, $uid, $time);

You will have to remember or fetch the id of a user before noting an entry in the log. I would recommend against using the name as the key as it is prone to change. You can use $dbh->last_insert_id to get that after inserting a user. Note the use of bind parameters to avoid SQL escaping and security problems.

Schwern
I didn't say s/he shouldn't have null values; I said it was hard to imagine a case where that's what would be wanted. I was actually hoping to elicit something saying the real problem is later on when the data is read from the database and output... :)
ysth