views:

24

answers:

1

I tried a simple script with

arr = data.scan /<td>([^<]+)/

and the arr is filled with the data within the <td> and </td> when it is run using

ruby try.rb

but when it is run using

ruby script/runner app/try.rb

so that it is run just like inside of script/console, then now there is an extra </td> attached to the matched data... Why would that be? It is Ruby 1.8.7 with Rails 2.3.8. Would it be due to unicode in the app environment or something else?

+1  A: 

I would leave this as a comment because it doesn't really answer anything but I can't, I'm new around here and I guess I don't have the rep to do so, please excuse me.

I mocked the setup, used ruby 1.8.7 with an fully functional app on rails 2.3.8 and both times I got the proper output without the trailing you mention. Now I am curious as to what's in data ? I used a generic table into a pretty simple html document. Works as it should.

One last thing worth mentioning maybe, regex to parse html is it a good idea ? I never had the need to use it but hpricot looks pretty neat for just that sort of thing http://github.com/hpricot/hpricot.

Hope this helps at least a little.

Hugo
ah, that page that had problem had Big5 + ASCII in it... although it shouldn't have worked in non-runner mode and worked differently in runner mode... unless the encoding was handled somewhat differently.
動靜能量