ansaurus

Question

Answer 1

A:

Not sure what you're asking really, but you can use a DOM parser and grab all text nodes through this xpath: //text() and extract them if that's what you're after. Then do what you want, enumerate, whatever.

meder 2010-06-13 22:58:22

Thanks for your reply, I looked into DOM and still trying to understand it completely. But I have taken a different approach and for that I am defining some features for webpages, as I have not much knowledge of HTML programming so I thought that people with better experience can suggest something about inner structure of HTML which can say something about the content position can location. Based on these features I was planning to train a part of the system.

ravi 2010-06-14 21:36:33

Answer 2

A:

Are you asking whether there is a particular HTML tag that is always used to contain any text that is on the page? In other words, whenever you see text on a web page, is it always contained in some particular tag? If that's the question, the answer is no. Text can appear in any tag. (Well, anywhere between <body> and </body>, but the same is true of all other non-text content.)

David Zaslavsky 2010-06-13 23:03:45

Thank you for your answer, yes I am asking this. So <"body"> and <"/body"> tags are like the main tag and everything is defined in between this.

ravi 2010-06-14 21:26:11

It's `<body>` and `</body>`, no quotes, but yes, all the text does appear between those two tags. There is also a `<head>` tag (and closing tag `</head>`), but things between those tags don't directly appear in the web page.

David Zaslavsky 2010-06-14 22:22:37

Thank you very much for the reply. I am now studying some of the tags definition if I can utilize their properties, also thanks for suggesting to update the question.

ravi 2010-06-14 23:40:09

Answer 3

A:

If you are making the page then you can put all text between <span> tags (note that span tags can contain any other content too). If it isn't your page then good luck - the text can be nearly anywhere.

slugster 2010-06-13 23:07:36

Was this a random down vote? Or is there actually something wrong with this answer, especially compared to other answers when the OP's intent is a little unclear?

slugster 2010-06-14 00:24:46

thank you for your answer.

ravi 2010-06-14 21:37:54

+1 for "the text can be nearly anywhere"

Stevko 2010-06-15 21:24:56

Answer 4

A:

HTML5 offers some content-area-specific tags... ?

Eh, revisiting this and reading your response above, it sounds like XSLT could be a possibility...

Imagine this situation: you have an XML document with custom tags, defined by you, which contain chunks of information, ie;

<Item>
<GenericText>Hello, </GenericText>
<AdminText>Admin, check the <a href="#">latest logs here</a></AdminText>
<UserText>User, please continue to look at my web page.</UserText>
</Item>

With XSLT, transformed on your server using a technology like PHP, you can write logic to accomodate which tags are displayed when. You could also insert valid, standard HTML tags inside your custom XML tags - written correctly, your XSLT will just parse it as a chunk of XHTML.

This would form the basis of a formal approach to developing pages, templates and such, most likely - so if you don't have access to do this, or lack the prowess, then it will be of little help.

Danjah 2010-06-13 23:11:11

thank you for your reply. not necessarily HTML5, as I will be working with any kind of webpage. But your suggestion will be very helpful when a HTML5 webpage is encounter.thanks

ravi 2010-06-14 21:42:45

Answer 5

+1 A:

It's not a question of a '<text>' tag or anything--the question is what kind of text is it? If it's a header, it's <h1> or <h2> or whatever--if it's a paragraph it's <p>, if it's a list it's either <ul> or <ol> with <li>s in between.

D_N 2010-06-13 23:13:17

thanks you very much for your answer. I will look into what are the common HTML tags and what are they use for as you have given examples.

ravi 2010-06-14 21:40:55

Answer 6

+1 A:

The answer to your question is, 'NO'. You can't. See boilerpipe for one example of the level of complexity involved in trying to find 'the main text' in a web page.

bmargulies 2010-06-15 21:04:36

Answer 7

A:

Have you looked at the Semantic Web?

It may help you understand the limitations of interpreting meaning from html tags.

Stevko 2010-06-15 21:24:06

ansaurus

tags:

views:

answers:

HTML tag for identifying text

related questions