So to start, I have an array of XML files. These files need to be iterated through and checked for certain 'unrecognized' hexadecimal characters and replaced with normal UTF-8 text, or some kind of placeholder.
I've tried iterating through the files and replacing the hex codes using both str_replace and preg_replace with no luck. My ultimate problem, is I'm receiving errors about 'non-utf characters' when trying to open these files with simpleXML.
Here's what I have so far:
class HexadecimalConverter {
public $filenames = array();
public function __construct($filenames) {
$this->filenames = $filenames;
$this->removeHex();
}
public function removeHex() {
foreach ($this->filenames as $key => $value) {
$contents = file_get_contents($value);
$contents = preg_replace("/\x96/", '–', $contents);
$contents = preg_replace("/\x97/", '—', $contents);
$contents = preg_replace("/\x85/", "...", $contents);
$contents = preg_replace("/\xBA/", "", $contents);
file_put_contents($value, $contents);
}
}
}
Here is the error I'm trying to fix: Warning: simplexml_load_file() [function.simplexml-load-file]: ./04R_P455_S1157.xml:5: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0x97 0x0D 0x0A 0x69 in C:\xampp\htdocs\hint_updater\libraries\hint_updater_classes.php on line 130
Still no luck, I've tried everything suggested in this thread, but the preg_replace doesn't appear to be replacing all instances of hex code.