views:

144

answers:

3

consider that i am getting a HTML format string and want to read the number of words & characters, Consider, i am getting,

var HTML =  '<p>BODY&nbsp;Article By Archie(Used By Story Tool)</p>';

now i want to get number of words and characters

above html will look like:

BODY Article By Archie(Used By Story Tool)

IMPORTANT

  1. i want to avoid html tags while counting words or character
  2. avoid keywords like **&nbsp;** etc..
  3. Ex. words and character should be counted of : (for current example)
    BODY Article By Archie(Used By Story Tool)

please help,
Thank You.

+2  A: 
  1. Use a hidden HTML element that can render text like span or p

  2. Assign the string to the innerHTML of the hidden element.

  3. Count the characters using length property of innerText/textContent.

To read the word count you can

  1. Split the innerText/textContent using empty space

  2. Get the length of the returned array.

rahul
Instead of `split`, I'd recommend using RegExp to take into account non-space breaks - for example, `text.match(/\b\w/g).length`
K Prime
Note that split() can also take a regular expression as an argument: https://developer.mozilla.org/En/Core_JavaScript_1.5_Reference/Objects/String/Split#Parameters
Annie
and can you please tell how to identify number(digit) an same example ie e.innerHTML ='<p>Super By My 4 Story</p>'; i want to ignore digit from word count
dexter
A: 

Algorithm:

  • Sweep through the entire html
  • Perform regex replaces
    • replace <.*> (regex for anyting tat stays withing <>)by nothing
    • replace /&nbsp/ by nothing
  • tip: can be done by replace function in javascript. hunt on w3schools.com

Now you have the clutter out!

then perform a simple word/character count

EFreak
+2  A: 

To give an example for adamantium's suggestion:

var e = document.createElement("span");
e.innerHTML = '<p>BODY&nbsp;Article By Archie(Used By Story Tool)</p>';
var text = e.textContent || e.innerText;

var characterCount = text.length;
var wordCount = text.split(/[\s\.\(\),]+/).length;

Update: Added other word-stop characters

Jon Benedicto
thanks for replying i have following query: the words can be separated by: blank space, fullstop,comma,opening round bracket(,closing round bracket ), dot . ,etchow that can be done**the above code is giving 9 word count it should be 8 i guess** pls reply
dexter
I've updated to have the other word-stop characters.
Jon Benedicto
your earlier example worked with K Primes idea of using .match() of regExp ,anyways.thanks for reply
dexter
and can you please tell how to identify number(digit) an same exampleie e.innerHTML ='<p>Super By My 4 Story</p>';i want to ignore digit from word count
dexter
to eliminate numeric sections, you'd use something like: /\s+[0-9]+\s+|[\s\.\(\),]+/
Jon Benedicto