Hi,
I have quite a large table with 19 000 000 records, and I have problem with duplicate rows. There's a lot of similar questions even here in SO, but none of them seems to give me a satisfactory answer. Some points to consider:
- Row uniqueness is determined by two columns,
location_id
anddatetime
. - I'd like to keep the execution time as fast as possible (< 1 hour).
- Copying tables is not very feasible as the table is several gigabytes in size.
- No need to worry about relations.
As said, every location_id
can have only one distinct datetime
, and I would like to remove all the duplicate instances. It does not matter which one of them survives, as the data is identical.
Any ideas?