views:

57

answers:

5

Hi, I have a table of products in my sql database. Each product has an image and 4 thumbnails. These just store the filename of an image on the hard drive. The problem I have is that over time I have deleted thousands of products but the images still remain on the hard drive.

Now I need to build a script (in c# .net) to remove any orphaned images. I assume the only way is to get a list of products into memory and loop though all the images in the directory to cross reference and remove if they don't exist.

Has anyone come across this before and can you give me any advice?

+9  A: 

A quicker way would be to write a procedure that moves all the images that are referenced to a new folder on the same hard drive. Then once you've verified you've got them all delete (or rename if you're feeling nervous) the old folder and finally rename the new one back to the original name.

You should backup the original folder first, just in case.

ChrisF
+1 for a nice idea!
Pekka
That is an excellent idea, because there are probably less images in the database than orphaned in the file system. Just get all the products and copy their images into another folder. Thanks!
CL4NCY
In that case wouldn't moving the images instead of copying them be faster?
Jeroen
@Jeroen - good point!
ChrisF
@Jeroen Thanks, moving will be better in my case as the hard drive probably isn't big enough to copy the files anyway.
CL4NCY
A: 

An idea comes to mind: Can you somehow flag the images that you DO need? In the most basic way you could rename them temporarily to lets say used_xxx.jpg

Then cursively loop your folders and delete all the images that dont have the used_ prefix. After that you rename back the still existing images to their original names.

Edit: This way you wouldn't have to copy anything. And you have a visual reference so you can see what you're doing.

Jeroen
I like that this way doesn't copy the images (as the hard drive isn't that big) but I still think looping through all the files might not be the best method for deleting them. Thanks.
CL4NCY
A: 

That really is the only way you would do it. I would suggest first iterating through the product images in the database and build a List<string> with their names and paths, then use the Directory object to pull a list of all images, recursively if necessary, in your images directory.

Then go through each file in the returned listing and compare it with your List<string>. If it doesn't exist in the list, delete it off the filesystem.

Remember to back up, just in case!

rakuo15
A: 

You seem to have pretty much answered your own question... Rather than iterate over all images in the directory you could iterate over all products in the database, and for each product copy the image to a new folder, but it's pretty much the same thing.

James Gaunt
A: 

If you're familiar with PowerShell at all, this seems like something which it would be good at doing. Here's a way to do it in T-SQL if your SQL server happens to be MS SQL Server and you have rights to run xp_cmdshell. I just wrote this out, so it doesn't have error checking, etc.

DECLARE
    @image_files TABLE (file_path VARCHAR(MAX))

DECLARE
    @file_path VARCHAR(MAX),
    @cmd VARCHAR(MAX)

INSERT INTO @image_files (file_path)
EXEC xp_cmdshell 'dir *.jpg /b /s /x'
-- Change the extension if you use GIF, etc.

DECLARE file_cursor CURSOR FOR
    SELECT
        file_path
    FROM
        @image_files
    WHERE
        file_path NOT IN
        (
            SELECT file_path
            FROM
                My_Files
        )

OPEN file_cursor

FETCH NEXT FROM file_cursor INTO @file_path

WHILE (@@FETCH_STATUS = 0)
BEGIN
    SET @cmd = 'EXEC xp_cmdshell ''del ' + @file_path + ''''
    EXEC(@cmd)

    FETCH NEXT FROM file_cursor INTO @file_path
END

CLOSE file_cursor

DEALLOCATE file_cursor
Tom H.
Thanks, this could be a good solution straight from the database rather than building a script.
CL4NCY