tags:

views:

44

answers:

2

I am using ruby 1.8.7. I am not using rails.

How do I find all the links which are not already in anchor tag.

s = %Q{ <a href='www.a.com'><b>www.a.com</b></a> www.b.com <div>www.c.com</div> }

The output of above string should be

www.b.com
www.c.com

I know "b" tag before www.a.com complicates the case but that's what I have to work with.

A: 

You are going to want to use a real XML parser (Nokogiri will do). Regexes are unsuitable for a task like this. Especially so in ruby 1.8.7 where negative look behind is not supported.

Ben Hughes
A: 

Dirty way to get rid of anchor tags. Doesn't work the way you want if they're nested. Also use a real parser ;-)

s.gsub(%r[<a\b.*?</a>]i, "")
=> "  www.b.com <div>www.c.com</div> "
taw