duplicate-removal

Tools for matching name/address data

Here's an interesting problem. I have an oracle database with name & address information which needs to be kept current. We get data feeds from a number of different gov't sources, and need to figure out matches, and whether or not to update the db with the data, or if a new record needs to be created. There isn't any sort of unique i...

Delete duplicate records from a SQL table without a primary key

I have the below table with the below records in it create table employee ( EmpId number, EmpName varchar2(10), EmpSSN varchar2(11) ); insert into employee values(1, 'Jack', '555-55-5555'); insert into employee values (2, 'Joe', '555-56-5555'); insert into employee values (3, 'Fred', '555-57-5555'); insert into employee values (4, '...

Comparing collections of subsets up to permutation

I have an array a[i][j]. The elements are char, interpreted as subsets of the set {1,...,8} (the element k is in the subset if the k-th bit is 1). I do not think it is relevant, but every element has exactly 4 bits set. Every row a[1][j]..a[n][j] is a collection of subsets of {1,...,8}. I need to remove duplicate rows, where two rows ar...

Python remove all lines which have common value in fields

I have lines of data comprising of 4 fields aaaa bbb1 cccc dddd aaaa bbb2 cccc dddd aaaa bbb3 cccc eeee aaaa bbb4 cccc ffff aaaa bbb5 cccc gggg aaaa bbb6 cccc dddd Please bear with me. The first and third field is always the same - but I don't need them, the 4th field can be the same or different. The thing is, I only wan...

Update column to be different aggregate values

I am creating a script that for "merging" and deleting duplicate rows from a table. The table contains address information, and uses an integer field for storing information about the email as bit flags (column name lngValue). For example, lngValue & 1 == 1 means its the primary address. There are instances of the same email being e...

AMQP Delay Delivery and Prevent Duplicate Messages

I have a system that will generate messages sporadically, and I would like to only submit either zero or one message every 5 minutes. If no message is generated, nothing would be processed by the queue consumer. If a hundred identical messages are generated within 5 minutes I only want one of those to be consumed from the queue. I am ...

Postgres csv import duplicate key error?

Hi, I am importing a CSV file to postgres. copy product from '/tmp/a.csv' DELIMITERS ',' CSV; ERROR: duplicate key value violates unique constraint "product_pkey" CONTEXT: COPY product, line 13: "1,abcd,100 pack" What is the best way to avoid this error.. Would I have to write a python script to handle this error.. ...

SQL Removing duplicates one row at a time

I have a table where I save all row-changes that have ever occurred. The problem is that in the beginning of the application there was a bug that made a bunch of copies of every row. The table looks something like this: copies |ID |CID |DATA | 1 | 1 | DA | 2 | 2 | DO | 2 | 3 | DO (copy of CID 2) | 1 | 4 | DA (copy of CID 1) | 2...

How to optimise a sql doublon checker

I would like to optimise my Doublon checker if anyone knows how it could be faster. $doublonchecker="delete bad_rows.* from eMail as good_rows inner join eMail as bad_rows on bad_rows.EMAIL = good_rows.EMAIL and bad_rows.EMAIL_ID > good_rows.EMAIL_ID"; $resultdoublon = mysql_query($doublonchecker); if (!$resultdoublon) { ...

Splitting strings and joining the duplicates

I'm trying to create a select box drop down with a twist, Basically this is an Ajax form that when an item is selected from the list it'll add it into a text field. However I also want to add a few extra selections in here. the string I'm getting is made up of COMPANY_SITE_DEPARTMENT and for example SDGCC_NEWTOWN_INBOUND. Using PHP I ...

Getting rid of duplicate results in MySQL query when using UNION

Hi everyone, I have a MySQL query to get items that have had recent activity. Basically users can post a review or add it to their wishlist, and I want to get all items that have either had a new review in the last x days, or was placed on someone's wishlist. The query goes a bit like this (slightly simplified): SELECT items.*, reacti...

SQL Query Duplicate Removal Help

I need to remove semi duplicate records from the following table ID PID SCORE 1 1 50 2 33 20 3 1 90 4 5 55 5 7 11 6 22 34 For any duplicate PID's that exist I want to remove the lowest scoring record. In the example above ID 1 would be remove. I'm tr...

How can I remove selected lines with an awk script?

I'm piping a program's output through some awk commands, and I'm almost where I need to be. The command thus far is: myprogram | awk '/chk/ { if ( $12 > $13) printf("%s %d\n", $1, $12 - $13); else printf("%s %d\n", $1, $13 - $12) } ' | awk '!x[$0]++' The last bit is a poor man's uniq, which isn't available on my target. Given the...

How to add particular value in starting of each row of an excel sheet

hello, i want to ask a question related to MS Excel. I have an excel sheet containing website addresses. The records counts nearly 3000. Now i want to filter it so that duplicates can be removed but the problem is that many web addresses (almost 2000 or so ) in my excel sheet start with httpfor example "http://www.google.com" and the r...

Remove Duplicates From File (vs COBOL)

This Cobol question really piqued my interest because of how much effort seemed to be involved in what seems like it would be a simple task. Some sloppy Python to remove file duplicates could be simply: print set(open('testfile.txt').read().split('\n')) How does removing duplicates in the same file structure as above in your langua...

Fast way to eyeball possible duplicate rows in a table?

Similar: http://stackoverflow.com/questions/91784 I have a feeling this is impossible and I'm going to have to do it the tedious way, but I'll see what you guys have to say. I have a pretty big table, about 4 million rows, and 50-odd columns. It has a column that is supposed to be unique, Episode. Unfortunately, Episode is not unique...

Mysql Duplicate Rows ( Duplicate detected using 2 columns )

How to remove duplicated in this setup? id A B 1 apple 2 2 orange 1 3 apple 2 4 apple 1 In here I want to remove (apple,2) which occurs twice. The id numbers are unique. I would use DISTINCT keyword if it were not. Can I some how make a key out of columns A and B and then use the DISTIN...

Help with dictionaries

I'm trying to remove duplicate items in a list through a dictionary: def RemoveDuplicates(list): d = dict() for i in xrange(0, len(list)): dict[list[i]] = 1 <------- error here return d.keys() But it is raising me the following error: TypeError: 'type' object does not support item assignment What is the ...

how to remove duplicate value in NSMutableArray

hi experts, i'm scanning wifi info using NSMutableArray, but there are few duplicate values appear, so i try to use following code but still getting the duplicate values, if([scan_networks count] > 0) { NSArray *uniqueNetwork = [[NSMutableArray alloc] initWithArray:[[NSSet setWithArray:scan_networks] allObjects]]; [scan_networks rem...

List Duplicates on specific Class

Hi, I have a list List<T> instances where T has a date variable and a string ID. Now I need the list to remove duplicates on the string ID and only keep the latest dates. Anyone know how? I was thinking of creating a new list List<T> final and looping through the instances list. In the loop checking if the list contains an item with th...