tags:

views:

175

answers:

2

hellom,

When I try to open a .log file created by a game in PHP I get a bunch of this.

ÿþ*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*� �
K�2� �E�n�g�i�n�e� �s�t�a�r�t� �u�p�.�.�.� �
[�2�0�0�9�/�2�2�/�0�9�]� �
[�1�6�:�0�7�:�3�3�]� �
[�0�.�1�.�4�6�.�0�]� �
[�0�]� �

I have no idea as to why? My code is

$file = trim($_GET['id']);
$handle = @fopen($file, "a+");

 if ($handle) {
  print "<table>";
     while (!feof($handle)) {
         $buffer = stream_get_line($handle, 10000, "\n");
         echo "<tr><td width=10>" . __LINE__ . "</td><td>" . $buffer . "</td></tr>";
        }
        print "</table>";

fclose($handle);

I'm using stream_get_line because it is apparently better for large files?

+1  A: 

You might be running into a UTF-8 Byte Order Mark: http://en.wikipedia.org/wiki/Byte-order%5Fmark Try reading it like so:

<?php
// Reads past the UTF-8 bom if it is there.
function fopen_utf8 ($filename, $mode) {
    $file = @fopen($filename, $mode);
    $bom = fread($file, 3);
    if ($bom != b"\xEF\xBB\xBF")
        rewind($file, 0);
    else
        echo "bom found!\n";
    return $file;
}
?>

From: http://us3.php.net/manual/en/function.fopen.php#78308

lod3n
It's not UTF-8, it's UTF-16LE. The UTF-16LE BOM is right there in the beginning of that data.
Michael Madsen
+7  A: 

PHP doesn't really know much about encodings. In particular, it knows nothing about the encoding of your file.

The data looks like UTF-16LE. so you'll need to convert that into something you can handle - or, since you're just printing, you can convert the entire script to output its HTML as UTF-16LE as well.

I would probably prefer converting to UTF-8 and using that as the page encoding, so you're sure no characters are lost. Take a look at iconv, assuming it's available (a PHP extension is required on Windows, I believe).

Note that regardless of what you do, you should strip the first two characters of the first line, assuming the encoding is always the same. In the data you're showing, these characters are the byte order mark, which tells us the file's encoding (UTF-16LE, like I mentioned earlier).

However, seeing as how it appears to be plain text, and all you're doing is printing the data, consider just opening it in a plain old text editor (that supports Unicode). Not knowing your operating system, I'm hesitant to suggest a specific one, but if you're on Windows and the file is relatively small, Notepad can do it.

As a side note, __LINE__ will not give you the line number of the file you're reading, it will print the line number of the currently executing script line.

Michael Madsen