views:

27

answers:

2

Hi All,

Having a bit of a "special" moment here. Basically I have a DB table for listing files and their attributes (size, creation date, etc). Being a self-proclaimed (borderline delusional) SQL-whiz, you can imagine my surprise when I failed to figure out a quality manner by which to individually list only files whose [INSERT ATTRIBUTE HERE] appeared multiple times.

For instance, say I wanted to see all files whose file size (or creation date) appeared more than once in the table, but I still wanted them listed separately. Or an even better application: a group of files that share MD5 hashes, but only ones that appear multiple times (i.e. duplicates), and I still want to list each file independently. Obviously I could loop through the DB multiple times, but I'm interested in a single query. Any thoughts?

Best.

[EDIT #1 (error in count)]

select * 
 from @foo where FileNameOnDisk in (
     select FileNameOnDisk, count([INSERT_ATTRIBUTE_HERE]) as num from @foo 
     group by FileNameOnDisk 
     having num > 1
 )order by FileNameOnDisk
+1  A: 

If I understand you correctly, using Sql Server you can try something like

DECLARE @Table TABLE(
        ID INT IDENTITY(1,1),
        FullFileName VARCHAR(500),
        DateCreated DATETIME,
        FileSize INT
)

INSERT INTO @Table (FullFileName, DateCreated, FileSize) 
SELECT 'a','01 Jan 2000',10
INSERT INTO @Table (FullFileName, DateCreated, FileSize) 
SELECT 'b','02 Jan 2000',1000
INSERT INTO @Table (FullFileName, DateCreated, FileSize) 
SELECT 'c','01 Jan 2000',100
INSERT INTO @Table (FullFileName, DateCreated, FileSize) 
SELECT 'd','03 Jan 2000',10

SELECT  t.*
FROM    @Table t INNER JOIN
        (
            SELECT  FileSize
            FROm    @Table
            GROUP BY FileSize
            HAVING COUNT(ID) > 1
        ) FileSizes ON  t.FileSize = FileSizes.FileSize
astander
+1  A: 

I'd approach it with the HAVING COUNT(someField) > 1. Here's a sample using FileName, but you could of course modify it for the MD5 or the datetime field.

declare @foo table (FileNameOnDisk varchar(100), MD5Hash varchar(100), CreatedOn smalldatetime)

INSERT Into @foo (FileNameOnDisk, MD5Hash, CreatedOn)
 VALUES ('foo', 'abc123', 'jan 1 2010'),
  ('bar', 'abc123', 'aug 1 2010'),
  ('ar', 'ajkfsd43', 'nov 1 2010'),
  ('baz', '44', 'sep 1 2010'),
  ('foo', 'abc123', 'aug 1 2010'),
  ('baz', '44', 'jan 1 2010')

 select * 
 from @foo where FileNameOnDisk in (
     select FileNameOnDisk from @foo 
     group by FileNameOnDisk 
     having COUNT(FileNameOnDisk) > 1
 )order by FileNameOnDisk
p.campbell
D'oh! I actually had it, but my problem was querying FOR the count in the second part:[SEE EDIT #1]...sometimes my brain hiccups like that and I think I actually have to include the count in the query in order to work with it later. Don't ask....Thanks to both of you!
humble_coder