tags:

views:

55

answers:

2

Hi,

I have just recently upgraded to ruby 1.92 and one of my monkey patches is failing with some sort of encoding error. I have the following function:

  def strip_noise()
    return if (!self) || (self.size == 0)

    self.delete(160.chr+194.chr).gsub(/[,]/, "").strip
  end

That now gives me the following error:

incompatible character encodings: UTF-8 and ASCII-8BIT

Has anyone else come across this?

Cheers

Paul

A: 

This might not be exactly what you want:

  def strip_noise
    return if empty?
    sub = 160.chr.force_encoding(encoding) + 194.chr.force_encoding(encoding)
    delete(sub).gsub(/[,]/, "").strip
  end

Read more on the topic here: http://yehudakatz.com/2010/05/17/encodings-unabridged/

Konstantin Haase
I now get "invalid byte sequence in UTF-8". Are 160.chr and 194.chr not valid UTF-8 characters?
dagda1
Dunno, apparently not. Why not place the actual characters in there instead of the bytes. Or cast the string itself to ASCII-8BIT?
Konstantin Haase
A: 

This is working for me at the moment anyway:

class String
  def strip_noise()
    return if empty? 
    self.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'')
  end
end

I need to do more testing but I can progress..

dagda1