Your problem can be complicated because if there is a div
that contains 2 words, plus another <p>
inside the div
with 200 words in it, then do you count the div
having 202 words, or do you count the p
having 200 words and therefore is the biggest?
If there are 4 borders for p
, then it can make sense to say it is p
with 200 words. If there is no border, then it makes sense to say it is div
with 202 words.
You can try writing a function to traverse down a node, and if there is any block
element with 4 borders, then don't include the word counts.
Things can be more complicated if there are floated div
s, which are set to display:inline
to work around an IE 6 bug. Or if there are borders, but the color is the same as the background color of the containing div
.
If you don't care about the inside elements having borders, then one attempt can be just to look at the immediate children of body, and find out how many characters there are inside of it (sum of text under all descendants, probably using innerText or innerHTML and strip all the tags).
You might also look into finding the biggest element with the biggest area (width x height), if you are looking for the content section, unless there is a long and narrow sidebar or ad section to the left and right, with the content area wide but really short.