views:

43

answers:

2

I have this text:

<body> 
<span class="Forum"><div align="center"></div></span><br /> 
<span class="Topic">Text</span><br /> 

   <hr /> 
  <b>Text</b> Text<br /> 
  <hr width=95% class="sep"/> 
  Text<a href="Text" target="_blank">Text</a> 
   <hr /> 
  <b>Text</b> -Text<br /> 
  <hr width=95% class="sep"/> 
 **Text what i need.**
   <hr /> 

and my RegEx for "Text what I need" - /"sep"(.*)hr/m .

It's wrong: Why?

+1  A: 

. doesn't match newlines in JavaScript regular expressions. Try:

/"sep"([\s\S]*)hr/m

IMO, you're much better off going for a different approach, regex isn't ideal for extracting data from HTML. A better method would be to create a div, set the element's innerHTML property to the HTML string you have, then use DOM traversal to find the text node you need.

Here's an example of what I mean: http://www.jsfiddle.net/W33n6/. It uses the following code to get the text:

var div = document.createElement("div");
div.innerHTML = html;
var hrs = div.getElementsByTagName("hr");

for (var i = 0; i < hrs.length; i++) {
    if (hrs[i].className == "sep") {
        document.body.innerHTML = hrs[i].nextSibling.nodeValue;
        break;
    }
}​

EDIT: Gumbo's version is a little stricter than mine, checking for the "sep" class among other classes and ensuring the node following is a text node.

Andy E
thanks for the instruction and links!
guest
+2  A: 

Don’t use regular expression, use DOM methods instead:

var elems = document.getElementByTagName("hr");
for (var i=0; i<elems.length; ++i) {
    var elem = elems[i];
    if (/(?:^|\s+)sep(?:\s|$)/.test(elem.className) &&
        elem.nextSibling && elem.nextSibling.nodeType === Node.TEXT_NODE) {
        var text = elems.nextSibling.nodeValue;
        break;
    }
}

This selects all HR elements, checks if it has the class sep and grabs the next sibling node if it is a text node.

Gumbo
thanks for the instruction!
guest
`Node.TEXT_NODE` won't be defined in IE, but it's a constant that maps to the number 3 anyway, so you can use `nodeType === 3`.
Andy E