ansaurus

Question

Splitting unicode (I think) using .split in ruby

Answer 1

A:

The \u2013 syntax only works with Ruby 1.9, which is fully Unicode aware. I'm guessing that you are running Ruby 1.8.

In Ruby 1.8, you can still use the unicode dash as argument to split. These both work:

feedentry.title.split("–")             # The actual UTF-8 char
feedentry.title.split("\342\200\223")  # The sequence of bytes

In regular expressions, remember to set the u modifier for unicode compatibility (outside of Rails):

@feedsplit = feedentry.title.gsub(/–/u,'-').split("-")

Alternatively, set $KCODE = "U", which implies the u modifier for all regular expressions. Rails does this for you already.

molf 2010-02-22 00:42:01

Thanks for the quick response. I tried this, but had no luck. I am using rails 1.8.6. I am using Feedzirra to fetch and parse the feeds, and it works with fine with most other ones. Last.fm seems to be causing all kinds of problems though.

2010-02-22 01:28:53

using the actual byte code did work however. @feedsplit = feedentry.title.gsub(/\342\200\223/u,"-").split("-")Thanks for the help!

2010-02-22 01:34:49

If the literal char does not work, your editor may be saving the source code as something other than UTF-8.

molf 2010-02-22 01:47:46

ansaurus

tags:

views:

answers:

Splitting unicode (I think) using .split in ruby

related questions