If it is a fully fledged valid xhtml document you shouldn't use loadhtml() but load()/loadxml().
Given the example xhtml document
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>xhtml test</title>
</head>
<body>
<h1>A Table</h1>
<table>
<tr><th>A</th><th>O</th><th>U</th></tr>
<tr><td>Ä</td><td>Ö</td><td>Ü</td></tr>
<tr><td>ä</td><td>ö</td><td>ü</td></tr>
</table>
</body>
</html>
the script
<?php
$raw2 = 'test.html';
$dom = new DOMDocument();
$dom->load($raw2);
$xpath = new DOMXPath($dom);
var_dump($xpath->registerNamespace('h', 'http://www.w3.org/1999/xhtml'));
$query = '//h:td/text()';
$nodes = $xpath->query($query);
foreach($nodes as $node) {
foo($node->wholeText);
}
function foo($s) {
for($i=0; $i<strlen($s); $i++) {
printf('%02X ', ord($s[$i]));
}
echo "\n";
}
prints
bool(true)
C3 84
C3 96
C3 9C
C3 A4
C3 B6
C3 BC
i.e. the output/strings are utf-8 encoded