I've got a large dataset, some of which is is duplicate records, which are identifiable by dupes in two fields.
To find these records, the following query works:
SELECT * FROM supplierstuffs
GROUP BY "Supplier Code", "Cost ex Tax"
HAVING count("Description") > 1
Basically what I want to do is cat together all the values of "Description" to form a single row, then replace all of the duplicated rows with the single row.
This is my half broken query so far, it's kludgy and horrid. My primary goal is to get this working - but if I learn some new tricks in sql along the way that's not at all a bad thing.
UPDATE supplierstuffs SET "Description" =
(SELECT array_to_string(array_accum("Description"), ', ') FROM supplierstuffs
GROUP BY "Supplier Code", "Cost ex Tax"
HAVING count("Description") > 1)
WHERE .....
This is far as I've gotten. What should I be reading to get a bit further? I've read a couple of books and a lot of webpages on the topic. However in this case I think my problem is not limited lack of SQL (ok, it's not my only problem) but more approaching the problem the wrong way.
EDIT 1:
'Name'; 'Supplier Code'; 'Desciption';
"7CPS PODIUM S/SLV CRICKET POLO";"7CPS";"04 -14, S - 3XL"
"7CP PODIUM CRICKET PANT ";"7CP";"08 -14, S - 2XL"
"7CPT PODIUM 3/4 SLV CRICKET POLO";"7CPT";"04 -14, S - 3XL"
"7CPL PODIUM L/SLV CRICKET POLO";"7CPL";"04 -14, S - 3XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL, XS - 2XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL, 8-16"
^^ is what I want to create from vv
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"S - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"8-16"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL"
"7CP PODIUM CRICKET PANT ";"7CP";"08 -14"
"7CP PODIUM CRICKET PANT ";"7CP";"S - 2XL"
"7CPL PODIUM L/SLV CRICKET POLO";"7CPL";"04 -14"
"7CPL PODIUM L/SLV CRICKET POLO";"7CPL";"S - 3XL"
"7CPS PODIUM S/SLV CRICKET POLO";"7CPS";"04 -14"
"7CPS PODIUM S/SLV CRICKET POLO";"7CPS";"S - 3XL"
"7CPT PODIUM 3/4 SLV CRICKET POLO";"7CPT";"04 -14"
"7CPT PODIUM 3/4 SLV CRICKET POLO";"7CPT";"S - 3XL"
^^ noting that the lines not having more than one description line need to remain untouched.
I've so far created the new records in a new table with:
INSERT INTO tmptable
SELECT "Name" , "Supplier Code", array_to_string(array_accum("Description"), ', ')
FROM supplierstuffs
GROUP BY "Name", "Supplier Code", "Description"
HAVING count("Description") > 1
So now all that remains is to delete the records that were caught by the cat command. It seems I can't DELETE FROM
with a having clause? I'm thinking that DELETE FROM table WHERE oid IN (SELECT OID's using having clause)
Will work?
EDIT 2:
SELECT array_accum(oid)
FROM supplierstuffs
GROUP BY "Name", "Supplier Code", "Colour", "Cost ex Tax"
HAVING count("Description") > 1
returns a few arrays of 2 oids, all of which need to be delorted. I feel I'm very close, and yet so far. Thanks in advance