Given a body of HTML, is there any function out there someone has written that will automatically extract say the top 10 keywords that appear from a chunk of HTML, excluding any HTML tags (IE just plain text)?
It should ignore common words like "and", "is" "but" etc but list the most frequent uncommon words.
Example input:
Mary had a <strong>snow</strong> lamb. <img src=lamb.jpg /> The <i>lamb</i> was snow white, it lay in the snow all white.
Output:
Snow (3)
White (2)
Lamb (2)
Jquery is fine!