views:

91

answers:

5

I have this code in a var.

<html>

    <head>
        .
        .
        anything
        .
        .
    </head>

    <body anything="">
        content
    </body>

</html>

or

<html>

    <head>
        .
        .
        anything
        .
        .
    </head>

    <body>
        content
    </body>

</html>

result should be

content
A: 

See here

fredley
That doesn't answer the question. Can you demonstrate?
Kobi
No. Certainly no more than that thread demonstrates.
fredley
Which is a good reason not to link there... But if you look hard enough, beyond the shiny unicode, you can find a real solution: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1737662#1737662
Kobi
A: 

Hi,

I believe you can load your html document into the .net HTMLDocument object and then simply call the HTMLDocument.body.innerHTML?

I am sure there is even and easier way with the newer XDocumnet as well.

And just to echo some of the comments above regex is not the best tool to use as html is not a regular language and there are some edge cases that are difficult to solve for.

Enjoy!

Doug
A: 

You can find examples that will not work, and no code will work with all possible text, but your particular example is not impossible to parse.

// var s= html text string

alert(s.substring(s.indexOf('<body'),s.lastIndexOf('</body')).replace(/^[^>]+>/,''));
kennebec
A: 

Try this:

 <script>
        var yourVar ='<html> <head></head> <body anything=""> content</body></html>';
        getContent = yourVar.split(/(<body[^>]*>|<\/body>)/ig)[2];
        alert(getContent);
    </script>

getContent var will contain 'content'

Meryl
A: 

Note that the string-based answers supplied above should work in most cases. The one major advantage offered by a regex solution is that you can more easily provide for a case-insensitive match on the open/close body tags. If that is not a concern to you, then there's no major reason to use regex here.

And for the people who see HTML and regex together and throw a fit...Since you are not actually trying to parse HTML with this, it is something you can do with regular expressions. If, for some reason, content contained </body> then it would fail, but aside from that, you have a sufficiently specific scenario that regular expressions are capable of doing what you want:

var strVal = yourStringValue; //obviously, this line can be omitted - just assign your string to the name strVal or put your string var in the pattern.exec call below 
var pattern = /<body[^>]*>((.|[\n\r])*)<\/body>/im
var array_matches = pattern.exec(strVal);

After the above executes, array_matches[1] will hold whatever came between the <body and </body> tags.

JGB146