views:

345

answers:

6

I am creating very simple CMS for my organisation.

My strategy is to embed editable content between tags called < editable >. However to hide these from the browser I am commenting them out. So an example of an editable region will look like this.

<!-- <editable name="news_item> Today's news is ... </editable> -->

With the content "Today's news is ... " being picked up by the CMS and made editable in the online HTML editor.

I would like to be able to "grab" the name attribute's value as well as the content contained within the tags.

Is there a simple way to do this with XPath, XQuey type things, or is regex the best way to go ( ]esp. given that the regex will not need too much fault tolerance, since I know exactly what the xml will be, because I will be writing the code that generates it).

+1  A: 

I'm pretty sure that you'd need to manually parse it via regex or another method. Comments aren't seen as DOM elements as far as I'm aware.

Matt Huggins
Comments are DOM elements. Is just that their contents aren't parsed as XML.
Ionuț G. Stan
+1  A: 

The whole point of a comment is that the DOM will not parse the content. So the whole comment is just text.

I'd be inclind to use RegEx in this case.

However if you certain the content is HTML you would create a DOM element (say a DIV) and assign the comment text to the innerHTML. The you could examine the DOM created from the element. Once you aquired what you need you could drop the DIV element which you would never have added to the current document.

AnthonyWJones
You could also use display:none on the div so it doesn't take up space or display its content, and then just leave it there with the data inside. That should work unless you run into browser compatibility issues.
teh_noob
+2  A: 

Most parsers are able to get comments without a problem. They will not probably parse them into a DOM structure, but you could do that with them manually once you get the actual comments.

This is an example using BeautifulSoup with Python:

>>> from BeautifulSoup import BeautifulSoup, Comment
>>> html_document = """
... <html>
... <head>
... </head>
... <body>
... <h1>My Html Document</h1>
... <!-- This is a normal comment. -->
... <p>This is some more text.</p>
... <!-- <editable name="news_item">Today's news is Paolo Rocks!</editable> -->
... <p>Yet More Content</p>
... </body>
... </html>
... """
>>> soup = BeautifulSoup(html_document)
>>> comments = soup.findAll(text=lambda text:isinstance(text,Comment))
>>> comments
[u' This is a normal comment. ', u' <editable name="news_item">Today\'s news is
Paolo Rocks!</editable> ']
>>> for comment in comments:
...     editable = BeautifulSoup(comment).find('editable')
...     if editable is not None:
...             print editable['name'], editable.contents
...
news_item [u"Today's news is Paolo Rocks!"]
Paolo Bergantino
A: 

You can use a DIV with a costum attribute like Dojo does a lot:

<div ParseByCMS="true">foobar foo bar foobaz</div>

After that you just use javascript or xslt to parse it and remove it.

the_drow
A: 

If you're using PHP.

    $xpath = new DOMXpath(new DOMDocument());

    // Search for comments
    $comments = $xpath->query('//comment()');
SleepyCod
+2  A: 

By DOM Parser, do you mean javascript? If so, this blog post suggests that you can indeed slice and dice HTML comments. And, because mentioning javascript without mentioning jQuery is a sin, here's a jQuery plugin that will find all the HTML comments for you.

Dan F
I like the idea of using jQuery
Ankur
The blog talks about exactly what I want to do. Good to know I am not the only one.
Ankur