views:

109

answers:

3

I have an html document that contains hundreds of special chracters (such as em dashes, smart apostrophes, accent egrave, etc) that I would like to convert to their html equivalents.

For example, my document contains an "em dash" (—), which I would like to convert to:

 —

Of course, my html document contains html tags. I do not want to convert parts of the html tags (such as "<" or ">") to html equivalents.

Is there any tool (php script, web application, desktop application, etc) where I can upload my html document, and that same document is returned, but modified to include html equivalents as needed?

I have many documents, with many special characters. I would like to avoid having to use "find and replace" (for each special character) as a solution... would take too long.

A: 

If you still want to do this:

Create a list of special chars with their respective code:

for example:

$htmlNumbers = array( "0" => array( "char"=>"—", "code"=>"&#8212" ),
                      "1" => array( "char"=>"@", "code"=>"&#64" ),
                      ---------------------
                      --------------------- 
                    );

Now get html content from html files and replace all chars with their codes using str_replace:

$html = file_get_contents("index.html");

for( $i=0; $i<count( $htmlNumbers ); $i++ ) {                    
    $html = str_replace( $htmlNumbers[$i]['char'] , $htmlNumbers[$i]['code'], $html );
}

echo $html;

Now you can save output into html file using file handling methods.

NAVEED
Read OP's question: "I would like to avoid having to use "find and replace" (for each special character) as a solution... would take too long."
MartyIX
A: 

you could use something like:

<?php
ob_start();
include 'test.html';
$content = ob_get_contents();
ob_clean();
$new = str_replace('<','$start$',$content);
$new = str_replace('>','$end$',$new);
$new = htmlentities($new);
$new = str_replace('$start$','<',$new);
$new = str_replace('$end$','>',$new);
echo $new;
ob_end_flush();
?>

then just change test.html to what ever file you want to remove special chars

edit: this is the same thing just automated for every html file in the same directory:

<?php
foreach(glob('*.html') as $file){
ob_start();
include $file;
$content = ob_get_contents();
ob_clean();
$new = str_replace('<','$start$',$content);
$new = str_replace('>','$end$',$new);
$new = htmlentities($new);
$new = str_replace('$start$','<',$new);
$new = str_replace('$end$','>',$new);
$file = fopen($file,'w');
fwrite($file,$new);
fclose($file);
}
echo 'done';
ob_end_flush();
?>
ldb358
A: 
$new = str_replace(array('&lt;', '&gt;'), array('<', '>'), htmlentities($old));
Tom