views:

585

answers:

3

I need to parse the html page for a patern. I am assuming that the matches are loaded into an array. And then I need to output the contents of the array.

<script language="JavaScript" type="text/javascript">
var adBookmarkletData=[
'<html><head><title>MYSA Yahoo! APT Debugger</title></head><body><center><div style=\"background:#ccc;color:#000;width:350px;text-align:left;padding:15px;border:2px #000;\">','<b>MYSA Yahoo! APT Debugger:</b><br /><hr />',
'<b>URL:</b> '+document.location.href+'<br />',
'<b>Pub ID:</b> '+window.yld_mgr.pub_id+'<br />',
'<b>Site Name:</b> '+window.yld_mgr.site_name+'<br />',
'<b>Content Topic ID List:</b> '+window.yld_mgr.content_topic_id_list+'<br />',
'<b>Site Section Name List:</b> '+window.yld_mgr.site_section_name_list+'<br />'
];
for(i in window.yld_mgr.slots){
 adBookmarkletData.push('<b>Ad:</b> ('+i+')<b>Category:</b>('+window.yld_mgr.slots[i].cstm_content_cat_list+')<br />');
 };
//Here my problem starts
    var myRegExp = new RegExp("place_ad_here\('(.*?)'\)");
//Here my Problem ends
adBookmarkletData.push(myRegExp.exec(document.innerHTML));

adBookmarkletData.push('</div></center></body></html>');
function createAptDebugger(){
   for (i in adBookmarkletData){
 document.write(adBookmarkletData[i]);
 }
};
void(createAptDebugger());
</script>

The RegEx pattern works in an online tester against sample code. But the results here are null. I do not get how to direct the RegEx against the html page and then to output it from the array.

For clarity the html will have tags like this in the body.

<script type="text/javascript">yld_mgr.place_ad_here('A728');</script>
<script type="text/javascript">yld_mgr.place_ad_here('ASPON120');</script>
<script type="text/javascript">yld_mgr.place_ad_here('ROLLOVER');</script>
<script type="text/javascript">yld_mgr.place_ad_here('A300');</script>
<script type="text/javascript">yld_mgr.place_ad_here('Middle1');</script>
<script type="text/javascript">yld_mgr.place_ad_here('B300');</script>

The results would look like this:

place_ad_here('A728')
place_ad_here('ASPON120')
place_ad_here('ROLLOVER')
place_ad_here('A300')
place_ad_here('Middle1')
place_ad_here('B300')

Which is pretty much how I want to display them.

Thanks in advance...

A: 

I believe the way you have it will only match the first match... I believe you need to do something like this..

while ( var match = myRegExp.exec(document.innerHTML)){ adBookmarkletData.push(match); }

Martin Murphy
Also keep in mind that in javascript you can just use the RegEx delimiters instead of the constructor./place_ad_here\('(.*?)'\)/ instead of new Regexp()
Martin Murphy
So I tried soitgoes' suggestion. But no joy. It gives me a syntax error.var myRegExp = new RegExp("place_ad_here\('(.*?)'\)");while ( var match = myRegExp.exec(document.innerHTML)){ adBookmarkletData.push(match); };
Sorry I didn't run the example. Fix whatever the syntax error is and see what happens
Martin Murphy
Yeah, Not sure what the error is. It was not specific.
+1  A: 

You're missing the g flag in your Regex. This will allow multiple matching.

This is what you want

Array.prototype.push.apply( adBookmarkletData
        , document.innerHTML.match( /place_ad_here\('[^']+'\)/g ) ) ;

string.match will return an array of all matches if you use the global g flag. Also, since push accepts only a list of arguments, apply is used to pass the args.

Laurent Villeneuve
+1  A: 

Hi Tony, Notice that both soitgoes and Laurent recommend or use the literal regexp delimiters (//). Your RegExp isn't working b/c you're escaping the parentheses within the string that is passed to RegExp constructor. You would need to double escape them.

new RegExp("place_ad_here\\('(.*?)'\\)","g")

That's why I prefer the literal regex and only use RexExp when I need to construct my regular expression at run time.

Other than that Laurent's answer should accomplish what you want. He just uses a slightly different regular expression. [^']+ vs. (.*)? Both should work for the text that you're describing.

If you want to maintain the output with the newlines at the end (1 per line), you could use replace instead of match and adjust your regexp accordingly.

One final note: your matching and/or replacing becomes more complicated if an input like

<script type="text/javascript">yld_mgr.place_ad_here('A728');</script>

spans more than one line or place_ad_here ever takes m**ore than one parameter**, so make sure you know all possible variations of your input. :)

Keith Bentrup