ansaurus

Question

Ruby Regex Help

Answer 1

+2 A:

How about

>(\d+)<

Or, if you desperately want to avoid using capturing groups at all:

(?<=>)\d+(?=<)

Joey 2010-03-14 01:54:09

This returns >32< but I guess I could just do string.match(/>(\d+)</).match(/\d+/)

bunnyBEARZ 2010-03-14 01:57:07

@bun: Well, you'll find the `32` in the first capturing group ... I edited the answer to include an example which doesn't need the group, though.

Joey 2010-03-14 01:59:23

Awesome, thanks a lot.

bunnyBEARZ 2010-03-14 02:05:23

Answer 2

A:

May be

<td[^>]*><font[^>]*>\d+</font></td>

Arkadiy 2010-03-14 01:58:51

This will certainly match the string above, but won't do anything to extract the `32`.

Joey 2010-03-14 02:00:48

Well, if Ruby's regexp synatx is borrowed from Perl, then you need to put \d+ in parentheses. And then use match()[1]

Arkadiy 2010-03-14 02:34:27

Answer 3

+2 A:

Please, do yourself a favor:

#!/usr/bin/env ruby
require 'nokogiri'

require 'test/unit'
class TestExtraction < Test::Unit::TestCase
  def test_that_it_extracts_the_number_correctly
    doc = Nokogiri::HTML('<td width=14 rowspan=2 align=right><font size=2 face="helvetica">32</font></td>')
    assert_equal [32], (doc / '//td/font').map {|el| el.text.to_i }
  end
end

Jörg W Mittag 2010-03-14 02:19:08

I agree. Going after HTML content with regex is a lot more error prone over the long term compared to using a parser.

Greg 2010-03-14 07:31:13

ansaurus

tags:

views:

answers:

Ruby Regex Help

related questions