ansaurus

Question

Removing XML entities from string in Ruby

Answer 1

+3 A:

Thats not garbage. It is just HTML sanitized string of characters. And I am assuming by the url, you mean with the html tags(<a></a>). Following code should work.

require 'cgi'
description = "&lt;/p&gt; &lt;a href=\"http://url.com/trac/xxx/wiki/foo?action=diff&amp;amp;amp;version=28\"&amp;gt;(diff)&amp;lt;/a&amp;gt;"
CGI.unescapeHTML(description) # => </p> <a href="http://url.com/trac/xxx/wiki/foo?action=diff&amp;amp;version=28"&gt;(diff)&lt;/a&gt;

If you don't want the html tags, there are various ways to just obtain the url. A simple regex for the url should work, which I leave it to you to figure out.(Hint - Google)

Chirantan 2009-11-10 05:05:29

Maciek Sawicki 2009-11-10 05:47:24

Whatever suits your needs. Depends on the size of the xml. If it is too huge, I would suggest use both collectively. Use XML parser to narrow down to the node where you want to extract the url from and then use regex. But again, whatever suits your needs.

Chirantan 2009-11-10 08:59:08

ansaurus

tags:

views:

answers:

Removing XML entities from string in Ruby

related questions