tags:

views:

28

answers:

1

All the tutorials and examples I've found of XSLT processing seem to assume your destination will be a significantly different format/structure to your source and that you know the structure of the source in advance. I'm struggling with finding out how to perform simple "in-place" modifications to a HTML document without knowing anything else about its existing structure.

Could somebody show me a clear example that, given an arbitrary unknown HTML source will:

1.) delete the classname 'foo' from all divs
2.) delete a node if its empty (ie <p></p>)
3.) delete a <p> node if its first child is <br>
4.) add newattr="newvalue" to all H1
5.) replace 'heading' in text nodes with 'title'
6.) wrap all <u> tags in <b> tags (ie, <u>foo</u> -> <b><u>foo</u></b>)
7.) output the transformed document without changing anything else

The above examples are the primary types of transform I wish to accomplish. Understanding how to do the above will go a long way towards helping me build more complex transforms.

To help clarify/test the examples here is a sample source and output, however I must reiterate that I want to work with arbitrary samples without rewriting the XSLT for each source:

<!doctype html>
<html>
<body>
  <h1>heading</h1>
  <p></p>
  <p><br>line</p>
  <div class="foo bar"><u>baz</u></div>
  <p>untouched</p>
</body>
</html>

output:

<!doctype html>
<html>
<body>
  <h1 newattr="newvalue">title</h1>
  <div class="bar"><b><u>baz</u></b></div>
  <p>untouched</p>
</body>
</html>
+3  A: 
Tomalak
Thank you very much, excellent explanation. I can see now why XLST is so uniformly unpopular. I didn't even imagine that a simple string replace would require up to 10 lines or that even simple operations require code that makes Perl seem legible.
SpliFF
@Tomalak, this solution is good (+1) and only with a few flaws.As for the ignorant, don't worry about them -- one cannot change them and they are themselves their biggest problem. There is nothing as harmful as the industrious fool. Fortunately, what they can't understand, they can't touch and destroy.
Dimitre Novatchev
@Dimitre: Appreciate your praise. I think you are a little too harsh in the rest of the comment. XSLT can be a shock to anybody who is not used to the concept, I can understand when people dismiss it as inferior to what they are familiar with.
Tomalak
@SpliFF: Perl may be one of the worst choices to compare XSLT with. ;) (Personally, I find Perl to be one ugly, messy write-only language. It's expressive and practical and powerful alright, still I refuse to use it because its sheer ugliness drives me away.) Perl is designed around string processing, and it's really good a that. XSLT is designed around XML structure processing, and it's *really* good at that. String processing was not one of the design goals, tough it is a lot better in XSLT 2.0. I'm sure that XSLT 10-liners exist that require 10 times the amount of Perl code.
Tomalak
@SpliFF, XSLT is at its best position, please don't mention underestimating words about it. [+1 for Tomolak]
infant programmer