tags:

views:

52

answers:

4

I am using PHP DOMDocument class to parse the HTML file, and using the code,

$dom =new DOMDocument();
@$dom->loadHTMLFile($file_path);
$dom->getElementById("my_id")

to fetch the data of the element with the ID "my_id", but the problem is the HTML document is containing multiple elements with same ID, and i want the data in all that elements.. The HTML code,

<div id="my_id">
     phone number 123
</div>
<div id="my_id">
     address somewhere 
</div>
 <div id="my_id">
     date of birth
</div>

i know the ID is unique, but here the case is like that.. in this case will getElementById() will return an array..

+1  A: 

No, if anything getElementById() will return a DOMElement. In case of multiple returned nodes, results would be a DOMNodeList, but that doesnt apply here.

Furthermore, DOM will not recognize your IDs until you validate the Document against a DTD or Schema file that defines the id attribute as an actual XML ID attribute, which is different from other attributes. That's why DOMAttr has a method isId and XML requires IDs to be of unique value. As VolkerK pointed out in the comments, when using loadHTMLFile, this validation will occur automatically.

See my answer to Simplify PHP DOM XML parsing - how? for more detailed information.

Gordon
Mostly true but; "the HTML file", there _could_ be different rules for HTML ;-) And loadHTMLFile() causes the html dtd to be used, i.e. `id` _is_ an identifier.
VolkerK
@Volker I'm not sure `loadHTMLFile` will `validateOnParse` automatically and I left any reference to the load method out intentionally.
Gordon
@Gordon: I've tested it. getElementById() worked and isId() returned true for $e->getAttributeNode('id')->isId() without further doing (not even a doctype for the html document).
VolkerK
@VolkerK okay, but what do you suggest I should change about my answer? Because `loadHTML` validating automatically doesnt falsify that a validation against a DTD has to happen at some point. That's explicitly stated in the manual for `getElementById()`
Gordon
@Gordon: The "Furthermore, DOM will not recognize your IDs until..." reads as if Harish needs to _do_ something to get getElementByID() to work - while there's no way around that `id` is an identifier and the document is invalid when loaded via loadhtmlfile(). But my main concern was with your first version only referring to xml - and "even that" didn't keep me from up-voting your answer ;-)
VolkerK
A: 

Nope. You'll find that the value of the getElementById is undefined, though you will be able to find out that the element is a DIV

gabe3886
A: 

Maybe a XPath Query for the ID-attribute can help.

Bernd Ott
A: 

If there's absolutely no way you (or somebody else) can fix the incoming data (which, as has been pointed out, is the only really right thing to do) This might be a case where SimpleHTMLDOM's more lenient parsing turns out to be fruitful.

I haven't tried how it deals with this, but I could imagine that

foreach ($html->find('div[id=my_id]') as $element)
 echo "Found ".$element->id."<br>";

works as needed.

Pekka
$dom->getElementById("my_id")->nodeName; gave me the Node name as "div" but other functions were not working... i will try the above code..
Harish Kurup