views:

429

answers:

4

Hi everyone.

Consider the following html code:

<div id='x'><div id='y'>Y content</div>X content</div>

I'd like to extract only the content of 'x'. However, its innerText property includes the content of 'y' as well. I tried iterating over its children and all properties but they only return the inner tags.

How can I access through the IHTMLElement interface only the actual data of 'x'?

Thanks

+1  A: 

Use something like:

function getText(this) {
    var txt = this.innerHTML;
    txt.replace(/<(.)*>/g, "");
    return txt;
}

Since this.innerHTML returns

<div id='y'>Y content</div>X content

the function getText would return

X content

Maybe this'll help.

justastefan
Well, yes, that would... But I really wanted to avoid using regex... Isn't there something more direct under IE DOM?
VitalyB
A: 

Use the childNodes collection to return child elements and textnodes You need to QI IHTMLDomNote from IHTMLelement for that.

Sheng Jiang 蒋晟
Thanks :). That sortof worked. I'm pasting the final code in the question for others.It didn't completely work though since the inner #text has some <br>'s insides so now I have to put it together, thus, some string manipulation is needed regardless.
VitalyB
A: 

Here is the final code as suggested by Sheng (just a part of the sample, of course):

mshtml.IHTMLElementCollection c = ((mshtml.HTMLDocumentClass)(wbBrowser.Document)).getElementsByTagName("div");
foreach (IHTMLElement div in c)
{
    if (div.className == "lyricbox")
    {
     IHTMLDOMNode divNode = (IHTMLDOMNode)div;

     IHTMLDOMChildrenCollection children = (IHTMLDOMChildrenCollection)divNode.childNodes;

     foreach (IHTMLDOMNode child in children)
     {
      Console.WriteLine(child.nodeValue);
     }
    }
}
VitalyB
A: 

Since innerText() doesn't work with ie, there is no real way i guess. Maybe try server-side solving the issue by creating content the following way:

<div id='x'><div id='y'>Y content</div>X content</div>
<div id='x-plain'>_plain X content_</div>

"Plain X content" represents your c# generated content for the element. Now you gain access to the element by refering to getObject('x-plan').innerHTML().

justastefan
Not really relevant in my case but thanks :)
VitalyB