Hi,
I would like to grab a kanji table on a Wikipedia page and I have a trouble using Nokogiri with special char. Here is my script:
# -*- encoding: utf-8 -*-
require 'rubygems'
require 'nokogiri'
require 'open-uri'
link = 'http://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji'
doc = Nokogiri::HTML(open(link))
doc.encoding = 'UTF-8'
d = []
doc.css('.wikitable tr').each do |tr|
row = []
tr.css('td').each {|td| row << td.text }
d << row
end
d.each {|row| row.each {|td| puts td } }
y = YAML.dump(d, STDOUT)
puts y
My trouble is that, it returns binary chars (like ã¯ã) rather then kanji characters (like 人).
How can I edit it in order to fix this? Thanks a lot.