views:

339

answers:

4

I'm familiar with tools like Deadweight for finding CSS not in use in your Rails app, but does anything exist for images? I'm sitting in a project with a massive directory of assets from working with a variety of designers and I'm trying to trim the fat in this project. It's especially a pain when moving assets to our CDN.

Any thoughts?

+3  A: 

Finding unsed images should be easier than CSS.

Just find *.jpg *.png *gif with glob, put those filenames to dictionary or array and find those filenames againt html, css, js files, remove filename if found and you will get unused list, and move those images to another folder with same directory structure (It will be good for restoring for just in case)

Basically like this, and of course for the file names that encrypted/encoded/obcuscated will not work.

require "fileutils"

img=Dir.glob("**/*.jpg")+Dir.glob("**/*.png")+Dir.glob("**/*.gif")
data=Dir.glob("**/*.htm*")+Dir.glob("**/*.css")+Dir.glob("**/*.js")

puts img.length.to_s+" images found & "+data.length.to_s+" files found to search against"

content=""
data.each do |f|
    content+=File.open(f, 'r').read   
end

img.each do |m|
    if not content=~ Regexp.new("\\b"+File.basename(m)+"\\b")
        FileUtils.mkdir_p "../unused/"+File.dirname(m)
        FileUtils.mv m,"../unused/"+m
        puts "Image "+m+" moved to ../unused/"+File.dirname(m)+" folder"
    end
end

PS: I used fileutils, because normal makedirs and mv are not works in my windows version of ruby

And I am not good at ruby, so please double check it before you use it.

Here is the sample results I ran in root folder of sample rails folder in my windows

---\ruby>ruby img_coverage.rb
5 images found & 12 files found to search against
Image depot/public/images/test.jpg moved to ../unused/depot/public/images folder
S.Mark
+9  A: 

It depends greatly on the code using the images. It's always possible that a filename is computed (by concatenating two values or string substitution etc) so a simply grepping by filename isn't necessarily enough.

You could try running wget (probably already installed if you've got a linux machine, otherwise http://users.ugent.be/~bpuype/wget/ ) to mirror your whole site. Do this on the same machine or network if you can, it'll crawl your whole site and grab all the images

# mirror mysite.com accepting only jpg, png and gif files
wget -A jpg,png,gif --mirror www.mysite.com

Once you've done that, you're going to have a second copy of your site's hierarchy containing any images that are actively linked to by any page reachable by crawling your site. You can then backup your source image directory, and replace it with wget's copy. Next, monitor your log files for 404's pertaining to gif/jpg/png files. Hope that helps.

meagar
wget couldn't crawl the entire site, various pages behind authentication. It needs to be a solution built with the assistance of Rails itself since we're using its helpers to display the images in the first place.
mwilliams
As an aside, bta's solution below regarding last-access-time is a good one, when combined with using `wget --mirror` to insure your whole site is crawled and all used images have their atime updated. However, atime is often disabled to boost performance, so you may have to enable it for this solution to work.
meagar
+2  A: 

If your file manager supports it, try sorting your images directory by the files' "last accessed" date. Files that haven't been accessed in a long time most likely aren't used any longer.

Along the same lines, you can also filter or grep through your web server's logs and make a list of the image files that it has served up in the last several months. Any images not in this list are likely unused.

bta
I think this is the most straightforward and platform-independent solution. Putting together a script to compare the log files to the assets folder will be a bit of work, but entirely possible.
Pekka
+6  A: 

If your image URLs often come from many computed / concatenated strings and other stuff hard to track programmatically within your source code, and your application is in heavy use, you could try a soft "honeypot" approach like this:

  • Move all the assets to a different directory, e.g. /attic
  • Set up an empty /images directory (or what your asset directory is called)
  • Set up a .htaccess file (if you're on Apache of course) that, using the -f flag, redirects all requests to nonexistent image files to a script
  • The script copies the requested file from the /attic into the /images directory and displays it
  • The next request to that image will go directly to the image, because it exists now

After some time and sufficient usage, all needed images should have been copied to the assets directory.

It's a "soft" approach of course because a dialog / situation could have not been opened/entered/used by any user during that time (things like error message icons for example). But it will recognize all used files, no matter where they're requested from, and might help sort out much of the unneeded files.

Pekka
Not on Apache, the solution needs to be within the context of Rails.
mwilliams
I don't know Rails at all, but I imagine it's possible to set this up in rails, as well. The `-f` flag is a rule that matches files that physically exist.
Pekka