Hello
how to extract all text from HTML file
I want to extract all text, in the alt attributes, < p > tags, etc..
however I don't want to extract the text between style and script tags
Thanks
right now I have the following code
    <?PHP
    $string =  trim(clean(strtolower(strip_tags($html_content))));
    $arr = explode(" ", $string);
    $count = array_count_values($arr);
    foreach($count as $value => $freq) {
          echo trim ($value)."---".$freq."<br>";
    }
    function clean($in){
           return preg_replace("/[^a-z]+/i", " ", $in);
    }
    ?>
This works great but it retrieves script and style tags which I don't want to retrieve and the other problem I am not sure if it does retrieve attributes like alt - since strip_tags function might remove all HTML tags with their attributes
Thanks