tags:

views:

78

answers:

3

Hi,

I am using nokogiri to screen scrape some HTML. In some occurrences, I am getting some weird characters back, I have tracked down the ASCII code for these characters with the following code:

  @parser.leads[0].phone_numbers[0].each_byte  do |c|
    puts "char=#{c}"
  end

The characters in question have an ASCII code of 194 and 160.

I want to somehow strip these characters out while parsing.

I have tried the following code but it does not work.

@parser.leads[0].phone_numbers[0].gsub(/160.chr/,'').gsub(/194.chr/,'')

Can anyone tell me how to achieve this?

Thanks

Paul

A: 

First thought would be should you be using gsub! instead of gsub

gsub returns a string and gsub! performs the substitution in place

Adam T
I wouldn't say he *should* be using `gsub!`. Without knowing the context, it might be more appropriate or it might be wildly inappropriate.
Chuck
I would agree. I was thinking in this context he wasn't assigning it another variable. But your right should was the wrong wording.
Adam T
+1  A: 

Your problem is that you want to do a method call but instead you're creating a Regexp. You're searching and replacing strings consisting of the string "160" followed by any character and then the string "chr", and then doing the same except with "160" replaced with "194".

Instead, do gsub(160.chr, '').

Chuck
I get the following error if I use that code:RegexpError: premature end of regular expression: /�/
dagda1
I think this is because gsub(194.chr, '') refers to a non ASCII character.
dagda1
@dagda1: What Ruby version are you using? I don't get that error in 1.8.7 or 1.9.1.
Chuck
A: 

You can also try

s.gsub(/\xA0|\xC2/, '')

or

s.delete 160.chr+194.chr
Mladen Jablanović
The delete function does the trick. Thanks!!
dagda1