Hi,
What about something like this :
$input = 'sometext<meta http-equiv="Content-type" content="text/html; charset=utf-8" />someothertext';
$output = preg_replace('#<meta http-equiv="Content-type" content="text/html; charset=(utf-8)" />#',
'<meta http-equiv="Content-type" content="text/html; charset=IS0-8859-1" />',
$input);
var_dump($output);
Which simply replaces the first string by the second one, giving you :
string 'sometext<meta http-equiv="Content-type" content="text/html; charset=IS0-8859-1" />someothertext' (length=95)
Of course, this is considering the input meta is always the same, always written the same way, with attributes in the same order and all that.
A regex a bit more forgiving might be :
$output = preg_replace('#<meta\s+http-equiv="Content-type"\s+content="text/html;\s+charset=(utf-8)"\s+/>#',
'<meta http-equiv="Content-type" content="text/html; charset=IS0-8859-1" />',
$input);
Of course, that is still not really forgiving ^^
But, if you know the meta used as input will alsways be the same, you don't need a regex ; str_replace
will do the job just fine, I suppose...
Something like this :
$output = str_replace('<meta http-equiv="Content-type" content="text/html; charset=utf-8" />',
'<meta http-equiv="Content-type" content="text/html; charset=IS0-8859-1" />',
$input);
var_dump($output);
Which gets you the same output :
string 'sometext<meta http-equiv="Content-type" content="text/html; charset=IS0-8859-1" />someothertext' (length=95)
EDIT after comments and edition of the OP
*(Yeah, I've seen another answer, based on str_replace, has been accepted... still, maybe this will be useful)*
If you really want to manipulate HTML that is not "fixed", over which you have no control, it might be better to not use regex at all, but some tool made exactly for that.
For instance, the bundled class DOMDocument
, and it's DOMDocument::loadHTML
can probably help ; maybe coupled with some XPath queries -- even if it kinda feels like heavy artillery ^^
For more informations, you can take a look at this answer I gave to another question a few days ago...
And, in your case, something like this would probably do :
$input = <<<HTML
<html>
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<title>Test</title>
</head>
<body>
<p>Hello, world!</p>
</body>
</html>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($input);
$xpath = new DOMXpath($dom);
$metas = $xpath->query('//meta[@http-equiv="Content-type"]');
if ($metas->length > 0) {
$meta = $metas->item(0);
$attribute = $meta->getAttribute('content');
if (strpos($attribute, 'text/html') === 0) {
$meta->setAttribute('content', 'text/html; charset=ISO-8859-1');
}
}
echo $dom->saveHTML();
The most interesting parts are :
- You are using a DOM parser, with standard DOM methods
- You can do XPath queries to locate exactly the element you need
The resulting HTML will look like this :
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<meta http-equiv="Content-type" content="text/html; charset=ISO-8859-1">
<title>Test</title>
</head>
<body>
<p>Hello, world!</p>
</body>
</html>
Maybe a bit heavier, and requires more code... But, with that, it should always work (well, as long as the HTML used as input is not too messed up, I suppose).
And it will work for anything else in the document ;-)
Maybe it's a bit too much in your case, but, with some luck, you will remember this the day you have to parse some HTML, and won't end up fighting with/against any kind of mutant regex ^^
Oh, and, of course : changing the meta content-type will not change the real encoding of your content : you'll still have to do that yourself, if necessary (for instance, see iconv or utf8_decode
)
You might also need to change the HTTP Content-type header (not sure about how browsers deal with the meta if/when the HTTP header is set)