ansaurus

Question

how to explode tags in a string?

Answer 1

+3 A:

If you can break on regular expressions, use the following delimiter:

<\s*[Bb][Rr]\s*\/*>

Explanation:

One left angle bracket, zero or more spaces, B or b, R or r, zero or more spaces, zero or more forward slashes.

To use the regex, look here:
http://www.regular-expressions.info/ruby.html

Stefan Kendall 2009-09-21 18:57:49

how do i break it? do i use gsub? string.gsub(<\s*[Bb][Rr]\s*\/*>) ?

2009-09-21 18:58:32

Looks like "split" is what you need.

Stefan Kendall 2009-09-21 18:59:41

Answer 2

A:

If you parse the string with Nokogiri, you can then scan through it and ignore anything other than text elements:

require 'nokogiri'
doc = Nokogiri::HTML.parse('a<Br>b<BR>c<br/>d<BR/>e<br />f')
text = []
doc.search('p').first.children.each do |node|
  text << node.content if node.text?
end
p text  # => ["a", "b", "c", "d", "e", "f"]

Note that you have to search for the first p tag because Nokogiri will wrap the whole thing in <!DOCTYPE blah blah><html><body>YOUR TEXT</body></html>.

Pesto 2009-09-21 19:02:35

Answer 3

+2 A:

So to implement iftrue's response:

a = 'a<Br>b<BR>c<br/>d<BR/>e<br />f'
a.split(/<\s*[Bb][Rr]\s*\/*>/)
=> ["a", "b", "c", "d", "e", "f"]

...you're left with an array of the bits of the string between the HTML breaks.

Max Masnick 2009-09-21 19:15:51

A little simpler with just / /i

glenn mcdonald 2009-09-21 19:32:44

thanks glenn that is the best.

2009-09-21 20:27:45

Answer 4

A:

Pesto's 99% of the way there, however Nokogiri supports creating a document fragment that doesn't wrap the text in the declaration:

 text = Nokogiri::HTML::DocumentFragment.parse('<Br>this<BR>is<br/>a<BR/>text<br />string').children.select {|n| n.text? and n.content } 
puts text
# >> this
# >> is
# >> a
# >> text
# >> string

Greg 2009-09-22 13:42:54

ansaurus

tags:

views:

answers:

how to explode <br> <br/> <br /> tags in a string?

related questions