views:

91

answers:

2

I have a title on a blog goes like this Main Idea, key term, key term, keyterm

I want the main idea and the key terms to have a different font sizes. First thing that came to mind was to search for the first comma and the end of the string and replace that chunk with the same thing but surrounded by span tags with a class to make the font smaller.

Here is the plan:

HTML (before)

  <a href="stupidreqexquestion">Main Idea, key term, key term, key term</a>

HTML (after)

  <a href="stupidreqexquestion">Main Idea <span class="smaller_font">, key term, key term key term</span></a>

I'm using Rails so I plan to add this as a helper function - for ex:

helper

  def make_key_words_in_title_smaller(title)
      #replace the keywords in the title with key words surrounded by span tags
  end 

view

  <% @posts.each do |post |%>
      <%= make_key_words_in_title_smaller(post.title)%>
  <% end -%>
+3  A: 

If you don't care about the Main Idea part being "Welcome home, Roxy Carmichael", that is, with a comman within double quotes

>> t = "Main Idea, key term, key term, key term"
=> "Main Idea, key term, key term, key term"

>> t.gsub(/(.*?)(,.*)/, '\1 <span class="smaller_font">\2</span>')
=> "Main Idea <span class=\"smaller_font\">, key term, key term, key term</span>"
動靜能量
works and is very simple, thanks!
Sam
+2  A: 

If the string is unadorned, (i.e., without tags) either of these works well:

data = 'Main Idea, key term, key term, key term'

# example #1
/^(.+?, )(.+)/.match(data).captures.each_slice(2).map { |a,b| a << %Q{<span class="smaller_font">#{ b }</span>}}.first 
# => "Main Idea, <span class=\"smaller_font\">key term, key term, key term</span>"

# example #2
data =~ /^(.+?, )(.+)/
$1 << %Q{<span class="smaller_font">#{ $2 }</span>} 
# => "Main Idea, <span class=\"smaller_font\">key term, key term, key term</span>"

If the string has tags then using regex to process HTML or XML is discouraged because it breaks so easily. Extremely trivial uses against HTML you control is pretty safe but if the content or format changes the regex can fall apart breaking your code.

HTML parsers are the usual recommended solution because they will continue working if the content or its formatting changes. This is what I'd do using Nokogiri. I was deliberately verbose to explain what was going on:

require 'nokogiri'

# build a sample document
html = '<a href="stupidreqexquestion">Main Idea, key term, key term, key term</a>'
doc = Nokogiri::HTML(html) 

puts doc.to_s, ''

# find the link
a_tag = doc.at_css('a[href=stupidreqexquestion]')

# break down the tag content
a_text = a_tag.content
main_idea, key_terms = a_text.split(/,\s+/, 2) # => ["Main Idea", "key term, key term, key term"]
a_tag.content = main_idea

# create a new node
span = Nokogiri::XML::Node.new('span', doc)
span['class'] = 'smaller_font'
span.content = key_terms

puts span.to_s, ''

# add it to the old node
a_tag.add_child(span)

puts doc.to_s
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"&gt;
# >> <html><body><a href="stupidreqexquestion">Main Idea, key term, key term, key term</a></body></html>
# >> 
# >> <span class="smaller_font">key term, key term, key term</span>
# >> 
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"&gt;
# >> <html><body><a href="stupidreqexquestion">Main Idea<span class="smaller_font">key term, key term, key term</span></a></body></html>

In the output above you can see how Nokogiri built the sample document, the span being added, and the resulting document.

It can be simplified to:

require 'nokogiri'

doc = Nokogiri::HTML('<a href="stupidreqexquestion">Main Idea, key term, key term, key term</a>')

a_tag = doc.at_css('a[href=stupidreqexquestion]')
main_idea, key_terms = a_tag.content.split(/,\s+/, 2)
a_tag.content = main_idea

a_tag.add_child("<span class='smaller_font'>#{ key_terms }</span>")

puts doc.to_s
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"&gt;
# >> <html><body><a href="stupidreqexquestion">Main Idea<span class="smaller_font">key term, key term, key term</span></a></body></html>
Greg
Your writing is fantastic! I wish I would have been more clear. What i meant by the css tag was that is how I wanted it to be after the regex or nokogiri was applied so you couldn't use it to get the key terms, you would have to use the first comma and the end of the string as markers. Seriously great post thanks a lot!
Sam
I'm not sure what you mean. It's possible to locate sections in a document without using XPath or CSS but the search will be a lot less accurate. Normally we look for some sort of constant "landmark" to navigate by, even if it means finding it then moving up, down, or sideways to get to the destination. If all you need is to adjust a simple string and add the `<span>` tag, then that is an incredibly simple problem, one I'd expect a Rails developer to have no problem figuring out.
Greg
The landmark would be the first comma and the end of the string so I don't know how Nokogiri would find that. I have used nokogiri for screen scraping such as creating news feeds, but it needs some sort of xml or html class to parse by AFAIK.
Sam