views:

131

answers:

4

I have a project with literally thousands of image files that aren't being used. The main problem is that they are intermixed with images that are.

Is there a way to get a list of all project artifacts which aren't referenced?

EDIT: Assuming I don't have access to the web logs... Is there an option?

A: 
  • access your web server logs, parse for GET's of the desired file pattern, unique them, then compare them against your reference list.

  • or, look at the file access dates (you may need to turn on this feature if you are sysop)

dar7yl
A: 

This was from a previous post.

At a file level:

use wget to aggressively spider the site and then process the http server logs to get the list of files accessed, diff this with the files in the site

diff \ <(sed some_rules httpd_log | sort -u) \ <(ls /var/www/whatever | sort -u) \ | grep something

Kyle B.
+1  A: 

Another approach -

Assuming all the image files are under one folder, try renaming the folder. The warnings in Visual Studio will tell you the files you need. :)

Matt
+2  A: 

Basically, no there isn't a straightforward, works-always way. You could build image-references based on user input or other context. So spidering your website means that you have to execute all code paths, otherwise you might throw away stuff that you actually need.

But now for the specific case of Chris, you could use multiple approaches:

  • search image for image for occurrences in your code (maybe automate this with visual studio plug-ins or so)
  • remove everything and start browsing your website, add all images that are not found. (this depends on the ratio of not used images versus used images)
  • search your code for all occurrences of .png, .jpg, .gif (and so on) and keep those images, throw everything else away.
  • ...
Michiel Overeem
Basically, it sounds like it's just going to be a work intensive task. great.
Chris Lively