I have a Ruby CGI (not rails) that picks photos and captions from a web form. My users are very keen on using smart quotes and ligatures, they are pasting from other sources. My web app does not deal well with these non-ASCII characters, is there a quick Ruby string manipulation routine that can get rid of non-ASCII chars?
A:
Quick GS revealed this discussion which suggests the following method:
class String
def remove_nonascii(replacement)
n=self.split("")
self.slice!(0..self.size)
n.each { |b|
if b[0].to_i< 33 || b[0].to_i>127 then
self.concat(replacement)
else
self.concat(b)
end
}
self.to_s
end
end
Joe
2009-08-12 19:51:56
Yes, I found that but it does not deal with unicode double byte chars right? Well, I will test this one, thanks for the help!
2009-08-12 19:54:36
A:
No there isn't short of removing all characters beside the basic ones (which is recommended above). The best slution would be handling these names properly (since most filesystems today do not have any problems with Unicode names). If your users paste in ligatures they sure as hell will want to get them back too. If filesystem is your problem, abstract it away and set the filename to some md5 (this also allows you to easily shard uploads into buckets which scan very quickly since they never have too many entries).
Julik
2009-08-13 02:43:39
+2
A:
class String
def remove_non_ascii(replacement="")
self.gsub(/[\x80-\xff]/,replacement)
end
end
klochner
2009-08-13 05:13:53
+1
A:
Here's my suggestion using Iconv.
class String
def remove_non_ascii
require 'iconv'
Iconv.conv('ASCII//IGNORE', 'UTF8', self)
end
end
Scott
2009-08-13 14:27:34