views:

145

answers:

2

I've spent entirely way too long trying to figure this out. I'm using XML:RSS and Perl to read / parse an Ebay RSS feed. Within the area, I see these entries:

<rx:BuyItNowPrice xmlns:rx="urn:ebay:apis:eBLBaseComponents">1395</rx:BuyItNowPrice>
<rx:CurrentPrice xmlns:rx="urn:ebay:apis:eBLBaseComponents">1255</rx:CurrentPrice>

However, I can't figure out how to grab the details during the loop. I wrote a regex to grab them:

@current_price = $item  =~ m/\<rx\:CurrentPrice.*\>(\d+)\<\/rx\:CurrentPrice\>/g;

which works if you place the above 'CurrentPrice' entry into a standalone string, but not while the script is reading through the RSS feed.

I can grab most of the information I want out of the item->description area (# bids, auction end time, BIN price, thumbnail image, etc), but it would be nicer if I could grab the info from the feed without me having to deal with grabbing all that information manually.

If anybody knows how to grab custom fields from an RSS feed (short of writing regexes to parse the entire feed w/o a module), any help / insight would be appreciated.

Here's the code I'm working with:

$my_limit = 0;
use LWP::Simple;
use XML::RSS;

$rss = XML::RSS->new();
$data = get( $mylink );
$rss->parse( $data );

$channel = $rss->{channel};

$NumItems = 0;
foreach  $item (@{$rss->{'items'}}) {
if($NumItems > $my_limit){
last;
}

@current_price = $item =~ m/\<rx\:CurrentPrice.*\>(\d+)\<\/rx\:CurrentPrice\>/g;

print "$current_price[0]";

}
+1  A: 

If you have the rss/xml document and want specific data you could use XPATH:

Perl CPAN XPATH

XPath Introduction

Salgar
Thanks -- I'm looking into this as a possible solution.
A: 

What is the way in which "it doesn't work" from an RSS feed? Do you mean no matches when there should be matches? Or one match where there should be several matches?

One thing that jumps out at me about your regular expression is that you use .*, which can sometimes be greedier than you want. That is, if $item contained the expression

<rx:BuyItNowPrice xmlns:rx="urn:...nts">1395</rx:BuyItNowPrice>
<rx:CurrentPrice xmlns:rx="urn:...nts">1255</rx:CurrentPrice>
<rx:BuyItNowPrice xmlns:rx="urn:...nts">1395</rx:BuyItNowPrice>
<rx:SomeMoreStuff xmlns:rx="urn:...nts">zzz</rx:BuyItNowPrice>
<rx:CurrentPrice xmlns:rx="urn:...nts">1255</rx:CurrentPrice>

then the first part of your regular expression (\<rx\:CurrentPrice.*\>) will wind up matching everything on lines 2, 3, and 4, plus the first part of line 5 (up to the >). Instead, you might want to use the regular expression1

m/\<rx:CurrentPrice[^>]*>(\d+)\<\/rx:CurrentPrice\>/

which will only match up to the closing </rx:CurrentPrice> tag after a single instance of an opening <rx:CurrentPrice> tag.

1 The other obvious answer is that you really don't want to use a regular expression at all, that regular expressions are inferior tools for parsing XML compared to customized parsing modules, and that all the special cases you will have to deal with using regular expressions will eventually render you unconscious from having repeatedly beaten your head against your desk. See Salgar's answer, for example.

mobrule
Thanks. I know that `.*` is greedy.. I was just being lazy trying to get the thing give me some output so I could tweak as necessary. Also, I learn things as I go along -- I don't have a coding background... there's a lot I don't know yet. By 'it doesn't work', I meant that there was no output, whatsoever, when I added that regex inside the foreach loop.I'm going to stick with grabbing the data from the HTML for now (the easiest, fastest answer for now), and if I figure this out, I'll post the 'fix' here -- I see where others have had this problem, too.Thanks again.