views:

212

answers:

1

I have user entries as filenames. Of course this is not a good idea, so I want to drop everything except [a-z], [A-Z], [0-9], _ and -.

For instance:

my§document$is°°   very&interesting___thisIs%nice445.doc.pdf

should become

my_document_is_____very_interesting___thisIs_nice445_doc.pdf

and then ideally

my_document_is_very_interesting_thisIs_nice445_doc.pdf

Is there a nice and elegant way for doing this?

+2  A: 

From http://devblog.muziboo.com/2008/06/17/attachment-fu-sanitize-filename-regex-and-unicode-gotcha/:

def sanitize_filename(filename)
  returning filename.strip do |name|
   # NOTE: File.basename doesn't work right with Windows paths on Unix
   # get only the filename, not the whole path
   name.gsub! /^.*(\\|\/)/, ''

   # Finally, replace all non alphanumeric, underscore 
   # or periods with underscore
   # name.gsub! /[^\w\.\-]/, '_'
   # Basically strip out the non-ascii alphabets too 
   # and replace with x. 
   # You don't want all _ :)
   name.gsub!(/[^0-9A-Za-z.\-]/, 'x')
  end
end
The MYYN
Thanks for the link! BTW, in the article you linked, the poster says that this function has problem.
marcgg
thx, corrected ..
The MYYN