ansaurus

Question

Answer 1

+3 A:

If you don't care about the Main Idea part being "Welcome home, Roxy Carmichael", that is, with a comman within double quotes

>> t = "Main Idea, key term, key term, key term"
=> "Main Idea, key term, key term, key term"

>> t.gsub(/(.*?)(,.*)/, '\1 <span class="smaller_font">\2</span>')
=> "Main Idea <span class=\"smaller_font\">, key term, key term, key term</span>"

動靜能量 2010-10-16 22:56:59

works and is very simple, thanks!

Sam 2010-10-17 02:51:20

Answer 2

+2 A:

If the string is unadorned, (i.e., without tags) either of these works well:

data = 'Main Idea, key term, key term, key term'

# example #1
/^(.+?, )(.+)/.match(data).captures.each_slice(2).map { |a,b| a << %Q{<span class="smaller_font">#{ b }</span>}}.first 
# => "Main Idea, <span class=\"smaller_font\">key term, key term, key term</span>"

# example #2
data =~ /^(.+?, )(.+)/
$1 << %Q{<span class="smaller_font">#{ $2 }</span>} 
# => "Main Idea, <span class=\"smaller_font\">key term, key term, key term</span>"

If the string has tags then using regex to process HTML or XML is discouraged because it breaks so easily. Extremely trivial uses against HTML you control is pretty safe but if the content or format changes the regex can fall apart breaking your code.

HTML parsers are the usual recommended solution because they will continue working if the content or its formatting changes. This is what I'd do using Nokogiri. I was deliberately verbose to explain what was going on:

require 'nokogiri'

# build a sample document
html = '<a href="stupidreqexquestion">Main Idea, key term, key term, key term</a>'
doc = Nokogiri::HTML(html) 

puts doc.to_s, ''

# find the link
a_tag = doc.at_css('a[href=stupidreqexquestion]')

# break down the tag content
a_text = a_tag.content
main_idea, key_terms = a_text.split(/,\s+/, 2) # => ["Main Idea", "key term, key term, key term"]
a_tag.content = main_idea

# create a new node
span = Nokogiri::XML::Node.new('span', doc)
span['class'] = 'smaller_font'
span.content = key_terms

puts span.to_s, ''

# add it to the old node
a_tag.add_child(span)

puts doc.to_s
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"&gt;
# >> <html><body><a href="stupidreqexquestion">Main Idea, key term, key term, key term</a></body></html>
# >> 
# >> <span class="smaller_font">key term, key term, key term</span>
# >> 
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"&gt;
# >> <html><body><a href="stupidreqexquestion">Main Idea<span class="smaller_font">key term, key term, key term</span></a></body></html>

In the output above you can see how Nokogiri built the sample document, the span being added, and the resulting document.

It can be simplified to:

require 'nokogiri'

doc = Nokogiri::HTML('<a href="stupidreqexquestion">Main Idea, key term, key term, key term</a>')

a_tag = doc.at_css('a[href=stupidreqexquestion]')
main_idea, key_terms = a_tag.content.split(/,\s+/, 2)
a_tag.content = main_idea

a_tag.add_child("<span class='smaller_font'>#{ key_terms }</span>")

puts doc.to_s
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"&gt;
# >> <html><body><a href="stupidreqexquestion">Main Idea<span class="smaller_font">key term, key term, key term</span></a></body></html>

Greg 2010-10-16 23:49:45

Your writing is fantastic! I wish I would have been more clear. What i meant by the css tag was that is how I wanted it to be after the regex or nokogiri was applied so you couldn't use it to get the key terms, you would have to use the first comma and the end of the string as markers. Seriously great post thanks a lot!

Sam 2010-10-17 02:53:38

I'm not sure what you mean. It's possible to locate sections in a document without using XPath or CSS but the search will be a lot less accurate. Normally we look for some sort of constant "landmark" to navigate by, even if it means finding it then moving up, down, or sideways to get to the destination. If all you need is to adjust a simple string and add the `<span>` tag, then that is an incredibly simple problem, one I'd expect a Rails developer to have no problem figuring out.

Greg 2010-10-17 21:42:33

The landmark would be the first comma and the end of the string so I don't know how Nokogiri would find that. I have used nokogiri for screen scraping such as creating news feeds, but it needs some sort of xml or html class to parse by AFAIK.

Sam 2010-10-18 17:46:17

ansaurus

tags:

views:

answers:

Simple regular express question

related questions