tags:

views:

73

answers:

4

Hi,

I need to get the "274.20p" out of:

<td nowrap="nowrap" class="dataRegularUlOn" style="text-align: right;">274.20p</td>

I would like to do regular expressions on:

<td    class="dataRegularUlOn"    >

so something like:

/<td(.*?)class="dataRegularUlOn"(.*?)>/

I'm using ruby, on linux.

thks

A: 

Why not use something like http://github.com/whymirror/hpricot instead and then you can just use the xpath to the element to retrieve the value.

Jamie
Again, same problem as above :p can not use gems atm ;) they don;t install
Steven
A: 

Are you parsing an html file? I think you should use XPath, really easy to use. For Ruby there is Nokogiri.

Using regexp, I would do like this:

ruby_sub_string = /.*[\d]+\.[\d]{1,2}p(.*)/.match(my_string)
ruby_sub_string[1]

It should do the trick. I can't try it rigth now though.

dierre
+1  A: 

Why do you want to write your own HTML parser, when there's plenty of perfectly capable HTML parsers already out there?

require 'nokogiri'

doc = Nokogiri::HTML('
    <td nowrap="nowrap" class="dataRegularUlOn" style="text-align: right;">
        274.20p
    </td>')

p doc.search('.dataRegularUlOn').map(&:text)
# => ["272.20p"]
Jörg W Mittag
This is the perfect method, but at uni, i can't install gems... :(
Steven
@Steven: Really? Not even inside your home directory? You can set the environment variables `GEM_HOME` and `GEM_PATH` to point somewhere inside your `$HOME` directory. In fact, if you call `gem install` and it detects that it can't write to the system directory, it should actually automatically fall back to the home directory. Anyway, there is a lenient XML library that can also parse many HTML documents in the stdlib that doesn't require a third-party library: `REXML` (`require 'rexml'`).
Jörg W Mittag
@Steven: Also, you don't actually *have* to install gems. You can also just install the files yourself somewhere in your `$HOME` directory and add that directory to Ruby's `$LOAD_PATH`.
Jörg W Mittag
A: 

Try this regular expression:

/<td[^>]*class="dataRegularUlOn"[^>]*>([^<]*)<\/td>/
rlandster
this is better than spliting on the end bit.thks alot
Steven