ansaurus

Question

Remove Duplicate ID's?

Answer 1

+3 A:

Read into a dictionary line by line, discarding duplicates. When all read, write out to a new file.

Mitch Wheat 2010-10-15 11:43:50

Answer 2

A:

i guess if you have large enough memory allowance, you can put all these ids in array

$array[$id] = $id;

this would automatically weed out the dupes.

Sabeen Malik 2010-10-15 11:44:12

Answer 3

A:

You can do:

file_put_contents($file,implode("\n",array_unique(file($file)));

How it works?

Read the file using function file which returns an array.
Get rid of the duplicate lines using array_unique
implode those unique lines with "\n" to get a string
write the string back to the file using file_put_contents

This solution assumes that you've got one ID per line in the flat file.

codaddict 2010-10-15 11:44:15

Answer 4

+2 A:

I've did some experiments once and the fastest solution I could get in PHP was by sorting the items and manually remove all the duplicate items.

If performance isn't that much of an issue for you (which I suspect, 50,000 is not that much) than you can use array_unique(): http://php.net/array_unique

WoLpH 2010-10-15 11:45:08

I ran this and it completed it in about a second. I thought it would take longer. Thanks.

Jamie Redmond 2010-10-15 11:58:02

@Jamie: The IO will probably dwarf the processing, because generating a unique set will have a max algorithmic complexity of n log n, which is somewhat fast, and most disk IO is very slow.

Merlyn Morgan-Graham 2010-10-15 12:08:47

Answer 5

A:

If you can use a terminal (or native unix execution), the easiest way: (assuming that there is nothing else in the file):

sort < ids.txt | uniq > filteredIds.txt

zebediah49 2010-10-15 11:45:37

uniq will only work if the ids are consecutive and duplicates are next to each other.

thetaiko 2010-10-15 11:53:14

Good point; I thought uniq auto-sorted:`sort < ids.txt | uniq > filteredIDS.txt`

zebediah49 2010-10-16 04:02:28

Answer 6

+4 A:

You can use the command line sort program to order and filter the list of ids. This is a very efficient program and scales well too.

sort -u ids.txt > filteredIds.txt

thetaiko 2010-10-15 11:50:41

Answer 7

A:

You can do it via array / array_unique, in this example i guess your ids are separated by line braks, if thats not the case just change it

$file = file_get_contents('/path/to/file.txt');
$array = explode("\n",$file);
$array = array_unique($array);
$file = implode("\n",$array);
file_put_contents('/path/to/file.txt',$file);

Hannes 2010-10-15 11:51:01

Thanks for the complete answer. I selected an answer before I seen this one.

Jamie Redmond 2010-10-15 11:58:52

@Jamie Redmond don't sweat it :)

Hannes 2010-10-15 12:22:47

Answer 8

A:

If you can just explode the contents of the file on a comma (or any delimiter), then array_unique will produce the least (and cleanest) code, otherwise if your are parsing the file going with the $array[$id] = $id is the fastest and cleanest solution.

gwagner 2010-10-15 16:53:14

ansaurus

tags:

views:

answers:

Remove Duplicate ID's?

related questions