views:

159

answers:

2

I've spent my requisite two hours Googling this, and I can not find any good answers, so let's see if humans can beat Google computers.

I want to parse a stylesheet in Ruby so that I can apply those styles to elements in my document (to make the styles inlined). So, I want to take something like

<style>
.mystyle {
  color:white;
}
</style>

And be able to extract it into a Nokogiri object of some sort.

The Nokogiri class "CSS::Parser" (http://nokogiri.rubyforge.org/nokogiri/Nokogiri/CSS/Parser.html) certainly has a promising name, but I can't find any documentation on what it is or how it works, so I have no idea if it can do what I'm after here.

My end goal is to be able to write code something like:

a_web_page = Nokogiri::HTML(html_page_as_string)
parsed_styles = Nokogiri::CSS.parse(html_page_as_string)
parsed_styles.each do |style| 
  existing_inlined_style = a_web_page.css(style.declaration) || ''
  a_web_page.css(style.declaration)['css'] = existing_inlined_style + style.definition
end

Which would extract styles from a stylesheet and add them all as inlined styles to my document.

+3  A: 

Nokogiri can't parse CSS stylesheets.

The CSS::Parser that you came across parses CSS expressions. It is used whenever you traverse a HTML tree by CSS selectors rather than XPath (this is a cool feature of Nokogiri).

There is a Ruby CSS parser, though. You can use it together with Nokogiri to achieve what you want.

require "nokogiri"
require "css_parser"

html = Nokogiri::HTML(html_string)

css = CssParser::Parser.new
css.add_block!(css_string)

css.each_selector do |selector, declarations, specificity|
  element = html.css(selector)
  element["style"] = [element["style"], declarations].compact.join(" ")
end
molf
+1  A: 

@molf definitely had a great start there, but it still required debugging a handful of problems to get it working in production. Here is the current, tested version of this:

    html = Nokogiri::HTML(html_string)
    css = CssParser::Parser.new
    css.add_block!(html_string) # Warning:  This line modifies the string passed into it.  In potentially bad ways.  Make sure the string has been duped and stored elsewhere before passing this.

    css.each_selector do |selector, declarations, specificity|
        next unless selector =~ /^[\d\w\s\#\.\-]*$/ # Some of the selectors given by css_parser aren't actually selectors.
        begin
            elements = html.css(selector)
            elements.each do |match|
                match["style"] = [match["style"], declarations].compact.join(" ")
            end
        rescue
            logger.info("Couldn't parse selector '#{selector}'")
        end
    end

    html_with_inline_styles = html.to_s
wbharding