duplicates

How do I remove duplicate items from an array in Perl?

I have an array in Perl: @my_array = ("one","two","three","two","three") How do I remove the duplicates from the array? ...

Fastest "Get Duplicates" SQL script

What is an example of a fast SQL to get duplicates in datasets with hundreds of thousands of records. I typically use something like: select afield1, afield2 from afile a where 1 < (select count(afield1) from afile b where a.afield1 = b.afield1); But this is quite slow. ...

How do I remove repeated elements from ArrayList?

I have an ArrayList of Strings, and I want to remove repeated strings from it. How can I do this? ...

How to detect duplicate text with some fuzzyness

Some thing ago, I write small script using Text::DeDupe to remove duplicates of blog posts before I have to lay my eyes on them. After reading Syntactic Clustering of the Web paper on which implementation is based, I would love to have ability to find overlapping documents (e.g. snippets of blogs as opposed to full text, maybe also quot...

Tool to find duplicate sections in a text (XML) file?

Hiya, I have an XML file, and I want to find nodes that have duplicate CDATA. Are there any tools that exist that can help me do this? I'd be fine with a tool that does this generally for text documents. ...

How do I store a duplicate value from an array or hash in Perl?

Hi all, Let's make this very easy. What I want: @array = qw/one two one/; my @duplicates = duplicate(@array); print "@duplicates"; # This should now print 'one'. Thanks =) ...

Count Duplicate URLs, fastest method possible

Hi Guys, I'm still working with this huge list of URLs, all the help I have received has been great. At the moment I have the list looking like this (17000 URLs though): http://www.domain.com/page?CONTENT_ITEM_ID=1 http://www.domain.com/page?CONTENT_ITEM_ID=3 http://www.domain.com/page?CONTENT_ITEM_ID=2 http://www.domain.com/page?CONT...

how to avoid duplicates in a has_many :through relationship?

Hey guys, how can I achieve the following? I have two models (blogs and readers) and a JOIN table that will allow me to have an N:M relationship between them: class Blog < ActiveRecord::Base has_many :blogs_readers, :dependent => :destroy has_many :readers, :through => :blogs_readers end class Reader < ActiveRecord::Base has_many :...

SQL: How to append IDs to the rows with duplicate values

I have a table with some duplicate rows. I want to modify only the duplicate rows as follows. Before: id col1 ------------ 1 vvvv 2 vvvv 3 vvvv After: id col1 ------------ 1 vvvv 2 vvvv-2 3 vvvv-3 Col1 is appended with a hyphen and the value of id column. ...

What is the best way to remove duplicates from a datatable?

I have checked the whole site and googled on the net but was unable to find a simple solution to this problem. I have a datatable which has about 20 columns and 10K rows. I need to remove the duplicate rows in this datatable based on 4 key columns. Doesn't .Net have a function which does this? The function closest to what I am looking f...

Removing duplicate rows in vi?

I have a text file that contains a long list of entries (one on each line). Some of these are duplicates, and I would like to know if it is possible (and if so, how) to remove any duplicates. I am interested in doing this from within vi/vim, if possible. ...

What is the best way to remove duplicates in an Array in Java?

I have an Array of Objects that need the duplicates removed/filtered. I was going to just override equals & hachCode on the Object elements, and then stick them in a Set... but I figured I should at least poll stackoverflow to see if there was another way, perhaps some clever method of some other API? ...

Detect duplicate MP3 files with different bitrates and/or different ID3 tags?

How could I detect (preferably with Python) duplicate MP3 files that can be encoded with different bitrates (but they are the same song) and ID3 tags that can be incorrect? I know I can do an MD5 checksum of the files content but that won't work for different bitrates. And I don't know if ID3 tags have influence in generating the MD5 ch...

Removing duplicate rows from table in Oracle

Hi, I'm testing something in Oracle and populated a table with some sample data, but in the process I accidentally loaded duplicate records, so now I can't create a primary key using some of the columns. How can I delete all duplicate rows and leave only one of them? This is in Oracle Thanks ...

SQL Performance - Better to Insert and Raise Exception or Check exists?

I'm considering an optimisation in a particularly heavy part of my code. It's task is to insert statistical data into a table. This data is being hit a fair amount by other programs. Otherwise I would consider using SQL Bulk inserts etc. So my question is... Is it ok to try and insert some data knowing that it might (not too often) ...

Java: Detect duplicates in ArrayList?

How could I go about detecting (returning true/false) whether an ArrayList contains more than one of the same element in Java? Many thanks, Terry Edit Forgot to mention that I am not looking to compare "Blocks" with each other but their integer values. Each "block" has an int and this is what makes them different. I find the int of a p...

How can I get artwork of a mp3 file in vb.net 2005?

Hi, I want to display an artwork in a picturebox which is associated with an mp3 file when I open the file in my audio player in vb.net 2005. An example would be appreciated. I've heard you can do it with ultraid3tags, but its in c# and I don't understand it :(. Help will be appreciated Thanks,sam ...

Best way to remove duplicate characters (words) in a string?

What would be the best way of removing any duplicate characters and sets of characters separated by spaces in string? I think this example explains it better: foo = 'h k k h2 h' should become: foo = 'h k h2' # order not important Other example: foo = 's s k' becomes: foo = 's k' etc. ...

How to find duplicates in 2 columns not 1

In my database, on a particular table there are several cloumns but the only 2 of interest right now are stone_id and upcharge_title. Individually they can each have duplicates, but they should never have a duplicate of BOTH of them having the same value. For example stone_id can have duplicates as long as for each duplicate upsharge ti...

How to keep a file's format if you use the uniq command (in shell)?

In order to use the uniq command, you have to sort your file first. But in the file I have, the order of the information is important, thus how can I keep the original format of the file but still get rid of duplicate content? ...