tags:

views:

395

answers:

1

I have an xpath to grab each text node which is not surrounded by any html tags. Instead, they are separated via <br>. I would like to wrap these with <span> tags.

Nokogiri::HTML(open("http://vancouver.en.craigslist.ca/van/swp/1426164969.html"))
.xpath("//br/following-sibling::text()|//br/preceding-sibling::text()").to_a

will return those text nodes.

complete revised code below:

doc = Nokogiri::HTML(open("http://vancouver.en.craigslist.ca/van/swp/1426164969.html"))
.xpath("//br/following-sibling::text()|//br/preceding-sibling::text()").wrap("<span></span>")
puts doc

I expected to see a full html source code with those texts wrapped with <span> tags, but I got the following:

Date: 2009-10-17,  4:36PM PDT
Reply to:
This is a spectacular open plan 1000 sq. ft. loft is in a former Canada Post building. Upon entering the loft from the hallway you are amazed at where you have arrived.... a stunning, bright and fully renovated apartment that retains its industrial feel. The restoration of the interior was planned and designed by a famous Vancouver architect.
The loft is above a police station, so youÂre guaranteed peace and quite at any time of the day or night.
The neighborhood is safe and lively with plenty of restaurants and shopping. ThereÂs a starbucks across the street and plenty of other coffee shops in the area.  Antique alley with its hidden treasures is one block away, as well as the beautiful mile long boardwalk. Skytrain station is one minute away (literally couple of buildings away). 15 minutes to Commercial drive, 20 minutes to downtown Vancouver and Olympic venues.
Apartment Features:
-       Fully furnished
-       14 ft ceilings
-       Hardwood floors
-       Gas fireplace
-       Elevator
-       Large rooftop balcony
-       Full Kitchen: Fully equipped with crystal, china and utensils
-       Dishwasher
-       Appliances including high-end juice maker, blender, etc.
-       WiFi (Wireless Internet)
-       Bathtub
-       Linens &amp; towels provided
-       Hair dryer
-       LCD Flat-screen TV with DVD player
-       Extensive DVD library
-       Music Library: Ipod connection
-       Wii console with Guitar Hero, games
-       Book and magazine library
-       Non-smoking
We are looking to exchange for a place somewhere warm (California, Hawaii, Mexico, South America, Central America) or a place in Europe (UK, Italy, France).
Email for other dates and pictures of the loft.
+1  A: 

Your doc variable is not assigned to whole document — you should use

doc = Nokogiri::HTML(open("http://vancouver.en.craigslist.ca/van/swp/1426164969.html"))
doc.xpath("//br/following-sibling::text()|//br/preceding-sibling::text()").wrap("<span></span>")
puts doc

Unfortunately it doesn't solve the problem as nokogiri places first all brs than all spans with text like this:

<br><br><br><br><span>
text</span><span>
text</span><span>
text</span><span>
text</span>

But you can do like this

doc = Nokogiri::HTML(open("http://vancouver.en.craigslist.ca/van/swp/1426164969.html"))
doc.search("//br/following-sibling::text()|//br/preceding-sibling::text()").each do |node|
  node.replace(Nokogiri.make("<span>#{node.to_html}</span>"))
end
puts doc
tig