duplicates

Best way to detect duplicate uploaded files in a Java Environment?

As part of a Java based web app, I'm going to be accepting uploaded .xls & .csv (and possibly other types of) files. Each file will be uniquely renamed with a combination of parameters and a timestamp. I'd like to be able to identify any duplicate files. By duplicate I mean, the exact same file regardless of the name. Ideally, I'd lik...

How search engines find the content duplication of a website.

Hi, I would like to know how search engines find that a content in the website is a duplicate content? And also how do they determine that it might be duplicated, do they use any specific technique or any tag line for it? Please provide me suggestion. ...

MySQL duplicates -- how to specify when two records actually AREN'T duplicates?

I have an interesting problem, and my logic isn't up to the task. We have a table with that sometimes develops duplicate records (for process reasons, and this is unavoidable). Take the following example: id FirstName LastName PhoneNumber email -- --------- -------- ------------ -------------- 1 John Doe 123-555-...

Fastest way to remove duplicate lines in very large .txt files

What is the best way to remove duplicate lines from large .txt files like 1 GB and more ? Because removing one-after-another duplicates is simple, we can turn this problem to just sorting file. Assume, that we can't load whole data to RAM, because of it's size. I'm just waiting to retreive all records from SQL table with one unique in...

inline function in namespace generate duplicate symbols during link on gcc

I have a namespace with inline function that will be used if several source files. When trying to link my application, the inline function are reported as duplicate symbols. It seems as if my code would simply not inline the functions and I was wondering if this is the expected behavior and how to best deal with it. I use the following ...

Filtering out duplicate XElements based on an attribute value from a Linq query

I'm using Linq to try to filter out any duplicate XElements that have the same value for the "name" attribute. Original xml: <foo> <property name="John" value="Doe" id="1" /> <property name="Paul" value="Lee" id="1" /> <property name="Ken" value="Flow" id="1" /> <property name="Jane" value="Horace" id="1" /> <property name="Paul" value...

List all duplicates in List object including quantity

I have List object filled with instances of a custom struct. list.Add(new Mail("mail1", "test11", "path11")); list.Add(new Mail("mail2", "test12", "path12")); list.Add(new Mail("mail1", "test13", "path13")); list.Add(new Mail("mail1", "test14", "path14")); list.Add(new Mail("mail2", "test15", "pat...

Returning Dictionary<FileHash, string[]> from Linq Query

Thanks in advance for any assistance. I'm not even sure if this is possible, but I'm trying to get a list of duplicate files using their hashes to identify the list of files associated with the hashes. I have this below: Dictionary<FileHash, string[]> FindDuplicateFiles(string searchFolder) { Directory.GetFiles(searchFolder, "*.*")...

mysql / file hash question

Hi, I'd like to write a script that traverses a file tree, calculates a hash for each file, and inserts the hash into an SQL table together with the file path, such that I can then query and search for files that are identical. What would be the recommended hash function or command like tool to create hashes that are extremely unlikely ...

Variations in spelling of first name

As part of a contact management system I have a large database of names. People frequently edit this and as a result we run into issues of the same person existing in different forms (John Smith and Jonathan Smith). I looked into word similarity but it's easy to think of name variations which are not similar at all (Richard vs Dick). I w...

How to remove duplicates based on level in hierarchy?

Hello, I have the following XML structure: <node name="A"> <node name="B"> <node name="C"/> <node name="D"/> <node name="E"/> </node> <node name="D"/> <node name="E"/> </node> I need to get all the leaf nodes. I use //node[not(node)] to get those. Now I need to remove duplicates by leaving elements that are deeper...

mySQL query to find duplicate row..

Exmaple: [empid date bookid] ---------- 1 5/6/2004 8 2 5/6/2004 8 1 5/7/2004 8 1 5/8/2004 6 3 5/8/2004 8 2 5/8/2004 7 In this table,I need to get empid 1 as output..since it has bookid 8 more than once.. thanks in advance.. ...

Xquery finding duplciate IDs

I have an XML database which contains elements which have an id. These are all unique. They also have a secondary identifier which links them to a similar object in another database. These are not all unique. Is there an XQuery which would let me identify all the non-unique IDs? I can count how many there are using distinct-values(), bu...

Json.NET (Newtonsoft.Json) - Two 'properties' with same name?

Hi all I'm coding in C# for the .NET Framework 3.5. I am trying to parse some Json to a JObject. The Json is as follows: { "TBox": { "Name": "SmallBox", "Length": 1, "Width": 1, "Height": 2 }, "TBox": { "Name": "MedBox", "Length": 5, "Width": 10, "Height": 10 }, ...

BATCH FILE to remove duplicate strings containing Double Quotes; and keep blank lines

BATCH FILE to remove duplicate strings (containing Double Quotes); and keep blank lines Note: The Final Output must have original strings with Double Quotes and Blank lines. I have been working on this for a long time and I can not fine a solution, thanks in advance for your assistance. When I get the remove duplicates working somethin...

Scala: Remove duplicates in list of objects

Hi Folks, I've got a list of objects List[Object] which are all instantiated from the same class. This class has a field which must be unique Object.property. What is the cleanest way to iterate the list of objects and remove all objects(but the first) with the same property? Cheers Parsa ...

Find duplicates or more in Mysql, delete them except the first one inputtet.

I have a table with rows like id, length, time and some of them are dublicates, where length and time is the same in some rows. I want to delete all copys of the first row submitted. id | length | time 01 | 255232 | 1242 02 | 255232 | 1242 <- Delete that one I have this to show all duplicates in table. SELECT idgarmin_track, length ...

LINQ Insert Into Database resulted in duplicates

I have a linq query running in a WCF Web Service that looks for a match and if one is not found then it creates one. my code looks like //ReadCommitted transaction using (var ts = CreateTransactionScope(TransactionScopeOption.RequiresNew)) { Contract contract = db.Contracts.SingleOrDefault(x => x.txtBlah == str); if (c...

Avoiding duplicate identifiers in the database

NOTICE: Appericiate all the answers, thanks but we already have a sequence.. and can't use the UNIQUE contraints because some items need to have duplicates.. I need to handle this using PLSQL somehow, so based on some criteria (using an if statement) i need to ensure there is no duplicates for that.. And just to confirm, these identifier...

mass rewrite urls on large ecommerse site?

We basically have about 30,000 urls and the problem at present is each url is exactly the same except for one character and we were wondering how we program it to mass 301 redirect, we basically have urls like the following. myurl/Law-and-Order(PC)/?asin=B0DVRE87YMU The correct url of the thousands of urls does not contain the / in th...