ansaurus

Question

Answer 1

A:

add a question mark to your pattern to make it non-greedy (and there's also no need of 's')

preg_match( "/charset=\"(.+?)\"/", $RemoteContent, $RemoteEncoding );
echo $RemoteEncoding[ 1 ];

note that this won't handle charset = "..." or charset='...' and many other combinations.

stereofrog 2010-04-15 10:59:39

That's what I needed. The only issue with your regex is that you allowed for a ["] after the [=] where there are none. After Taking that out with its backslash, it worked as required with a few examples. Keeping your note in mind as I look at the other suggestions as well. Thank you.

Yallaa 2010-04-15 12:44:48

Answer 2

A:

Take a look at Simple HTML Dom Parser. With it, you can easily find the charset from the head without resorting to cumbersome regexes. But as David already commented, you should also examine the headers for the same information and prioritize it if found.

Tested example:

require_once 'simple_html_dom.php';

$source = file_get_contents('http://www.google.com');
$dom = str_get_html($source);
$meta = $dom->find('meta[http-equiv=content-type]', 0);
$src_charset = substr($meta ->content, stripos($meta ->content, 'charset=') + 8);

foreach ($http_response_header as $header) {
    @list($name, $value) = explode(':', $header, 2);
    if (strtolower($name) == 'content-type') {
        $hdr_charset = substr($value, stripos($value, 'charset=') + 8);
        break;
    }
}

var_dump(
    $hdr_charset,
    $src_charset
);

nikc 2010-04-15 11:00:42

Also, downloaded Simple HTML Dom Parser and looking into that as well.thank you for the suggestion,

Yallaa 2010-04-15 12:38:51

ansaurus

tags:

views:

answers:

Detect remote charset in php

related questions