ansaurus

Question

Answer 1

A:

You can normally parse this file, and check what rows are duplicated. I think RAGEX is a worst solution for this problem.

Svisstack 2010-09-27 13:56:55

Answer 2

A:

What language are you using? In .NET, with little effort you could load the CSV file in to a DataTable and find/remove the duplicate rows. Afterwards, write your DataTable back to another CSV file.

Heck, you can load this file in to Excel and sort by a field and find the duplicates manually. 500 isn't THAT many.

rodey 2010-09-27 13:57:28

Answer 3

+3 A:

What duplicates are you searching for? The whole lines or just the same phone number?

If it is the whole line, then try this:

sort phonelist.txt | uniq -c | sort -n

and you will see at the bottom all lines, that occur more than once.

If it is just the phone number in some column, then use this:

awk -F ';' '{print $4}' phonelist.txt | uniq -c | sort -n

replace the '4' with the number of the column with the phone number and the ';' with the real separator you are using in your file.

Or give us a few example lines from this file.

EDIT:

If the data format is: name,mobile,phone,uniqueid,group, then use the following:

awk -F ',' '{print $3}' phonelist.txt | uniq -c | sort -n

in the command line.

eumiro 2010-09-27 13:59:40

Erm..in which language is this?

Nimbuz 2010-09-27 14:15:34

Lines are in this format: `name,mobile,phone,uniqueid,group`

Nimbuz 2010-09-27 14:16:39

Perfect, many thanks! :)

Nimbuz 2010-09-27 14:54:45

Answer 4

+2 A:

Yes. For one way to do it, look here. But you would probably not want to do it this way.

Robusto 2010-09-27 14:00:12

Already looked there, this `(?<=,|^)([^,]*)(,\1)+(?=,|$)` only matches commas in a comma delimited CSV.

Nimbuz 2010-09-27 14:03:42

Answer 5

A:

use PERL.

Load the CSV file into an array, and match the column you want to check (phone numbers) for duplicates, then store the values into another array, then check for duplicates in that array, using:

my %seen;
my @unique = grep !$seen{$_}++, @array2;

After that, all you need to do is load the unique array(phone numbers) into a for loop, and inside it load array#1(lines) into a for loop. Compare the phone number in the unique array, and if it matches, output that line into another csv file.

Ruel 2010-09-27 14:12:59

ansaurus

tags:

views:

answers:

Finding Duplicates (Regex)

related questions