views:

50

answers:

2

Hi all, I am writing a greasemonkey script that is parsing a page with the following general structure:

<table>
<tr><td><center>
<b><a href="show.php?who=IDNumber">(Account Name)</a></b>
 (#IDNumber)
<br> (Rank)
<br> (Title)
<p>
<b>Statistics:</b>
<br>
<table>
<tr><td>blah blah etc.
</td></tr></table></center></table>

I'm specifically trying to grab the (Title) part out of that. As you can see, however, it's set off only by a <BR> tag, has no ID of its own, is just part of the text of a <CENTER> tag, and that tag has a whole raft of other text associated with it.

Right now what I'm doing to get that is taking the innerHTML of the Center tag and using a regex on it to match for /<br>([A-Za-z ]*)<p><b>Statistics/. That is working okay for me, but it feels like there's gotta be a better way to pick that particular text out of there.

... So, is there a better way? Or should I complain to the site programmer that he needs to make that text more accessible? :-)

+1  A: 

EDIT: updated to remove whitespace

var title = $('table center').contents().filter(function() {
         if( this.nodeType == 3 && $.trim(this.data) != "") { //get only text nodes and filter out whitespace elements
           return true;
        }
      }).get(2); // get the 3rd text node 


    alert( title.data ); // alerts "(Title)
    title.data = "How to use jQuery"; // (Title) changes

How it works:

The function is run through all of the nodes in the provided node, in this case that's the center tag. Text is nodeType 3, so you'll get an array of those. Your example has the closing center tag misplaced, so that might give you errors but I think you get the idea. (i think you're missing a at the end of that before )

You could always:

  $('table center').contents().filter(function() {
       if( this.nodeType == 3 && $.trim(this.data) != "") { //get only text nodes and filter out whitespace elements
           return true;
        }
      }).wrap('<p></p>') // make those text nodes paragraphs
      .end().filter('br')
        .remove(); // remove the brs

see the jquery docs on .contents()

Dan Heberden
IE produces different results. Firefox and Safari place the (Title) in the 4th textNode, IE places it in the 3rd.
patrick dw
With the center tag fixed? I know IE sees space as a text node, think that's it?
Dan Heberden
@Dan Heberden - Hard to say. I just calls 'em like I sees 'em. :o)
patrick dw
Edited the conditional in the answer so FF and IE match.
Dan Heberden
Thanks, Dan! Exactly what I wanted.
Hellion
+1  A: 

This seems to work:

var result = $('table td:first-child > center > br:eq(1)').get(0)

alert(result.nextSibling.nodeValue);
patrick dw
Nice - get the node after the br, brilliant!
Dan Heberden
@Dan Heberden - Thanks. This had me stumped. Your solution to trim the whitespace was a good fix!
patrick dw