Hi... When using LOAD DATA INFILE, is there a way to either flag a duplicate row, or dump any/all duplicates into a separate table?
From the LOAD DATE INFILE documentation:
The REPLACE and IGNORE keywords control handling of input rows that duplicate existing rows on unique key values:
- If you specify REPLACE, input rows replace existing rows. In other words, rows that have the same value for a primary key or unique index as an existing row. See Section 12.2.7, “REPLACE Syntax”.
- If you specify IGNORE, input rows that duplicate an existing row on a unique key value are skipped. If you do not specify either option, the behavior depends on whether the LOCAL keyword is specified. Without LOCAL, an error occurs when a duplicate key value is found, and the rest of the text file is ignored. With LOCAL, the default behavior is the same as if IGNORE is specified; this is because the server has no way to stop transmission of the file in the middle of the operation.
Effectively, there's no way to redirect the duplicate records to a different table. You'd have to load them all in, and then create another table to hold the non-duplicated records.
It looks as if there actually is something you can do when it comes to duplicate rows for LOAD DATA calls. However, the approach that I've found isn't perfect: it acts more as a log for all deletes on a table, instead of just for LOAD DATA calls. Here's my approach:
Table test:
CREATE TABLE test (
id INTEGER PRIMARY KEY,
text VARCHAR(255) DEFAULT NULL
);
Table test_log:
CREATE TABLE test_log (
id INTEGER, -- not primary key, we want to accept duplicate rows
text VARCHAR(255) DEFAULT NULL,
time TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Trigger del_chk:
delimiter //
drop trigger if exists del_chk;
CREATE TRIGGER del_chk AFTER DELETE ON test
FOR EACH ROW
BEGIN
INSERT INTO test_log(id,text) values(OLD.id,OLD.text);
END;//
delimiter ;
Test import (/home/user/test.csv
):
1,asdf
2,jkl
3,qwer
1,tyui
1,zxcv
2,bnm
Query:
LOAD DATA INFILE '/home/ken/test.csv'
REPLACE INTO TABLE test
FIELDS
TERMINATED BY ','
LINES
TERMINATED BY '\n' (id,text);
Running the above query will result in 1,asdf
, 1,tyui
, and 2,jkl
being added to the log table. Based on a timestamp, it could be possible to associate the rows with a particular LOAD DATA
statement.