views:

327

answers:

2

Hello everyone Masters Of Web Delevopment :) I have a piece of PHP script that fetches last 10 played songs from my winamp. This script is inside file (lets call it "lastplayed.php") which is included in my site with php include function inside a "div". My site is on UTF-8 encoding. The problem is that some songs titles are in Windows-1251 encoding. And in my site they displays like "������"... Is there any known way to tell to this div with included "lastplayed.php" in it, to be with windows-1251 encoding? Or any other suggestions?

P.S: The file with fetching script a.k.a. "lastplayed.php", is converted to UTF-8. But if it is ANCII it's the same result. I try to put and meta tag with windows-1251 between head tag but nothing happens again.

P.P.S: Script that fetches the Winamp's data (lastplayed.php):

<?php
/******
* You may use and/or modify this script as long as you:
* 1. Keep my name & webpage mentioned
* 2. Don't use it for commercial purposes
*
* If you want to use this script without complying to the rules above, please contact me first at: [email protected]
* 
* Author: Martijn Korse
* Website: http://devshed.excudo.net
*
* Date:  08-05-2006
***/

/**
 * version 2.0
 */
class Radio
{
    var $fields = array();
    var $fieldsDefaults = array("Server Status", "Stream Status", "Listener Peak", "Average Listen Time", "Stream Title", "Content Type", "Stream Genre", "Stream URL", "Current Song");
    var $very_first_str;
    var $domain, $port, $path;
    var $errno, $errstr;
    var $trackLists = array();
    var $isShoutcast;
    var $nonShoutcastData = array(
        "Server Status"  => "n/a",
        "Stream Status"  => "n/a",
        "Listener Peak"  => "n/a",
        "Average Listen Time" => "n/a",
        "Stream Title"  => "n/a",
        "Content Type"  => "n/a",
        "Stream Genre"  => "n/a",
        "Stream URL"  => "n/a",
        "Stream AIM"  => "n/a",
        "Stream IRC"  => "n/a",
        "Current Song"  => "n/a"
        );
    var $altServer = False;

    function Radio($url)
    {
     $parsed_url = parse_url($url);
     $this->domain = isset($parsed_url['host']) ? $parsed_url['host'] : "";
     $this->port = !isset($parsed_url['port']) || empty($parsed_url['port']) ? "80" : $parsed_url['port'];
     $this->path = empty($parsed_url['path']) ? "/" : $parsed_url['path'];

     if (empty($this->domain))
     {
      $this->domain = $this->path;
      $this->path = "";
     }

     $this->setOffset("Current Stream Information");
     $this->setFields();  // setting default fields

     $this->setTableStart("<table border=0 cellpadding=2 cellspacing=2>");
     $this->setTableEnd("</table>");
    }

    function setFields($array=False)
    {
     if (!$array)
      $this->fields = $this->fieldsDefaults;
     else
      $this->fields = $array;
    }
    function setOffset($string)
    {
     $this->very_first_str = $string;
    }
    function setTableStart($string)
    {
     $this->tableStart = $string;
    }
    function setTableEnd($string)
    {
     $this->tableEnd = $string;
    }

    function getHTML($page=False)
    {
     if (!$page)
      $page = $this->path;
     $contents = "";
     $domain = (substr($this->domain, 0, 7) == "http://") ? substr($this->domain, 7) : $this->domain;


     if (@$fp = fsockopen($domain, $this->port, $this->errno, $this->errstr, 2))
     {
      fputs($fp, "GET ".$page." HTTP/1.1\r\n".
       "User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)\r\n".
       "Accept: */*\r\n".
       "Host: ".$domain."\r\n\r\n");

      $c = 0;
      while (!feof($fp) && $c <= 20)
      {
       $contents .= fgets($fp, 4096);
       $c++;
      }

      fclose ($fp);

      preg_match("/(Content-Type:)(.*)/i", $contents, $matches);
      if (count($matches) > 0)
      {
       $contentType = trim($matches[2]);
       if ($contentType == "text/html")
       {
        $this->isShoutcast = True;
        return $contents;
       }
       else
       {
        $this->isShoutcast = False;

        $htmlContent = substr($contents, 0, strpos($contents, "\r\n\r\n"));

        $dataStr = str_replace("\r", "\n", str_replace("\r\n", "\n", $contents));
        $lines = explode("\n", $dataStr);
        foreach ($lines AS $line)
        {
         if ($dp = strpos($line, ":"))
         {
          $key = substr($line, 0, $dp);
          $value = trim(substr($line, ($dp+1)));
          if (preg_match("/genre/i", $key))
           $this->nonShoutcastData['Stream Genre'] = $value;
          if (preg_match("/name/i", $key))
           $this->nonShoutcastData['Stream Title'] = $value;
          if (preg_match("/url/i", $key))
           $this->nonShoutcastData['Stream URL'] = $value;
          if (preg_match("/content-type/i", $key))
           $this->nonShoutcastData['Content Type'] = $value;
          if (preg_match("/icy-br/i", $key))
           $this->nonShoutcastData['Stream Status'] = "Stream is up at ".$value."kbps";
          if (preg_match("/icy-notice2/i", $key))
          {
           $this->nonShoutcastData['Server Status'] = "This is <span style=\"color: red;\">not</span> a Shoutcast server!";
           if (preg_match("/ultravox/i", $value))
            $this->nonShoutcastData['Server Status'] .= " But an <a href=\"http://ultravox.aol.com/\" target=\"_blank\">Ultravox</a> Server";
           $this->altServer = $value;
          }
         }
        }
        return nl2br($htmlContent);
       }
      }
      else
       return $contents;
     }
     else
     {
      return False;
     }
    }

    function getServerInfo($display_array=null, $very_first_str=null)
    {
     if (!isset($display_array))
      $display_array = $this->fields;
     if (!isset($very_first_str))
      $very_first_str = $this->very_first_str;

     if ($html = $this->getHTML())
     {
       // parsing the contents
      $data = array();
      foreach ($display_array AS $key => $item)
      {
       if ($this->isShoutcast)
       {
        $very_first_pos = stripos($html, $very_first_str);
        $first_pos = stripos($html, $item, $very_first_pos);
        $line_start = strpos($html, "<td>", $first_pos);
        $line_end = strpos($html, "</td>", $line_start) + 4;
        $difference = $line_end - $line_start;
        $line  = substr($html, $line_start, $difference);
        $data[$key] = strip_tags($line);
       }
       else
       {
        $data[$key] = $this->nonShoutcastData[$item];
       }
      }
      return $data;
     }
     else
     {
      return $this->errstr." (".$this->errno.")";
     }
    }

    function createHistoryArray($page)
    {
     if (!in_array($page, $this->trackLists))
     {
      $this->trackLists[] = $page;
      if ($html = $this->getHTML($page))
      {
       $fromPos = stripos($html, $this->tableStart);
       $toPos  = stripos($html, $this->tableEnd, $fromPos);
       $tableData = substr($html, $fromPos, ($toPos-$fromPos));
       $lines  = explode("</tr><tr>", $tableData);
       $tracks = array();
       $c = 0;
       foreach ($lines AS $line)
       {
        $info = explode ("</td><td>", $line);
        $time = trim(strip_tags($info[0]));
        if (substr($time, 0, 9) != "Copyright" && !preg_match("/Tag Loomis, Tom Pepper and Justin Frankel/i", $info[1]))
        {
         $this->tracks[$c]['time'] = $time;
         $this->tracks[$c++]['track'] = trim(strip_tags($info[1]));
        }
       }
       if (count($this->tracks) > 0)
       {
        unset($this->tracks[0]);
        if (isset($this->tracks[1]))
         $this->tracks[1]['track'] = str_replace("Current Song", "", $this->tracks[1]['track']);
       }
      }
      else
      {
       $this->tracks[0] = array("time"=>$this->errno, "track"=>$this->errstr);
      }
     }
    }
    function getHistoryArray($page="/played.html")
    {
     if (!in_array($page, $this->trackLists))
      $this->createHistoryArray($page);
     return $this->tracks;
    }
    function getHistoryTable($page="/played.html", $trackColText=False, $class=False)
    {
     $title_utf8 = mb_convert_encoding($trackArr ,"utf-8" ,"auto");

     if (!in_array($page, $this->trackLists))
      $this->createHistoryArray($page);
     if ($trackColText)
      $output .= "
      <div class='lastplayed_top'></div>
      <div".($class ? " class=\"".$class."\"" : "").">";
     foreach ($this->tracks AS $title_utf8)
      $output .= "<div style='padding:2px 0;'>".$title_utf8['track']."</div>";
     $output .= "</div><div class='lastplayed_bottom'></div>
     <div class='lastplayed_title'>".$trackColText."</div>
     \n";
     return $output;
    }
}

 // this is needed for those with a php version < 5
 // the function is copied from the user comments @ php.net (http://nl3.php.net/stripos)
if (!function_exists("stripos"))
{
    function stripos($haystack, $needle, $offset=0)
    {
     return strpos(strtoupper($haystack), strtoupper($needle), $offset);
    }
}
?>

And the calling script outside the lastplayed.php:

include "lastplayed.php";
$radio = new Radio($ip.":".$port);
echo $radio->getHistoryTable("/played.html", "<b>Last played:</b>", "lastplayed_content");
+4  A: 

If all of your source data is in windows-1251, you can use something like:

$title_utf8=mb_convert_encoding($title,"utf-8","Windows-1251")

and put that converted data in your HTML stream.

Since I'm only looking at docs, I'm not 100% sure that the source encoding alias is correct; you may want to try CP1251 if Windows-1251 doesn't work.

If your source data isn't reliably in 1251, you'll have to come up with a heuristic to guess, and use the same conversion method. mb_detect_encoding may help you.

You cannot change the encoding of just part of an HTML document, but you can certainly convert everything to UTF-8 easily enough.

The newer ID3 implementations have an encoding marker in their text frames:

$00 ISO-8859-1 (ASCII)
$01 – UCS-2 in ID3v2.2 and ID3v2.3, UTF-16 encoded Unicode with BOM. 
$02 – UTF-16BE encoded Unicode without BOM in ID3v2.4 only.
$03 – UTF-8 encoded Unicode in ID3v2.4 only.

Is it possible that your content is in UTF16?

Based on the code you've posted, it's not clear how $trackArr is defined, as it's not referenced elsewhere. It looks like you have several problems.

$title_utf8 = mb_convert_encoding($trackArr ,"utf-8" ,"auto")

"auto" expands to a list of encodings that do not include Windows-1251, so I'm not sure why you've used it. You really should use "Windows-1251". I have tried using "Windows-1251,utf-16" on a mac with PHP installed, but autodetect fails to find a suitable encoding against a relatively short string, so it looks like you're going to have to be the one to guess.

But that code doesn't look like it has any reason to exist anyway, as you overwrite the values with your iteration:

    foreach ($this->tracks AS $title_utf8)
            $output .= "<div style='padding:2px 0;'>".$title_utf8['track'].\"</div>";

In each iteration, the variable $title_utf8 is assigned to the current track. What you probably want is something more like:

    foreach ($this->tracks AS $current_track)
            $output .= "<div style='padding:2px 0;'>". mb_convert_encoding($current_track ,"utf-8" ,"Windows-1251");

mb_convert_encoding takes a string as the first argument, not an array or object, so you need to apply this encoding on each string that is not utf-8.

JasonTrue
Thank you JasonTrue,this makes sense, but obviously my wanted encoding isn't wIndows-1251... now I have to figure out what Is the encoding of that god damned fetched data!.... Damned...But I tryed your code, and it doesnt seems to work, or I do something wrong...
Spoonk
Perhaps you may need to dig deeper into the content. I've added a note about the Encoding markers that ID3 supports.I can't guarantee my code works since I don't use PHP much, but you can certainly lookup the documentation for the methods I used.
JasonTrue
Most probably you are wright. I saw configuration in my winamp's mp3 plugin. Quote: "ID3 Tag Reading - System Language", and "ID3 Tag Writing - Unicode (UTF-16)"... The other options to choose from are "System Language" and "latin-1"... And after I change this options, nothing happens... may be I need to reload the whole playlist, and then to see in site if there is any change?!...Hell.. I didnt know this is going to be such a problem...
Spoonk
Actually When I open my "lastplayed.php" and I manually tell the browser that encoding is Windows-1251, all of titles are displayed properly. When this file is included inside my utf-8 based site, then the problem becomes. But the important thing here is, that obviously windows-1251 is the correct encoding for my song titles - winamp sends data in this encoding may be?Damned... I'm totally lost now... :)) I don't know what to think :)
Spoonk
Most likely, your CD ripping software or whatever originally encoded your files used CP_ACP (the system ANSI code page) unless it was reasonably unicode-savvy. Based on the configuration you describe, it sounds like WinAmp will assume CP_ACP unless the encoding is marked explicitly; when creating a new ID3 record when you create a new item, it will explicitly set to UTF-16. Your code should handle the implicit case the same way (probably), and be aware of the explicit encoding markers in the ID3 data. "Windows-1251, UTF-16" as the "from-encoding" parameter to mb_convert_encoding may suffice?
JasonTrue
can you post the relevant code in your question that shows how lastplayed.php is retrieving the ID3 tag, and how it is writing the data to the response stream?
JasonTrue
Ok, I am editing my original post with it.
Spoonk
It looks like you're not using the mb_convert_encoding with appropriate arguments. I've edited my answer with details.
JasonTrue
A: 

Just to let you know that the latest version supports character encoding/decoding :-)

Martijn Korse