tags:

views:

145

answers:

2

My java program needs to rewrite urls in html (just in time). I am looking for the right tool and wonder if antlr is doing the job for me?

For example:

<html><body>  <img src="foo.jpg" /> </body></html> 

should be rewritten as:

<html><body>  <img src="http://foo.com/foo.jpg" /> </body></html> 

I want to read/write from/to a stream (byte by byte).

A: 

What about Regular expressions ?

khmarbaise
A: 

As khmarbaise said, first make sure, if regular expressions can do it. But there are cases, in which they can't [*], and then I think, ANTLR might really be a legitimate choice.

[*] For the mathematical background on this, see http://en.wikipedia.org/wiki/Formal_grammar#The_Chomsky_hierarchy

Update

Now that you updated your question, I see what you really want to do: For modifying a complete HTML file, I'd use a parser like NekoHTML, or something similar: http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/

Then you can use these to extract the URL. Then

  • parse only the URL itself - e. g. with Regexes, Java's URL class (or sometimes better: URI), or maybe ANTLR
  • modify the parsed URL
  • and write out the HTML again, using NekoHTML/...

Do not use regular expressions to parse the entire HTML file! You could use ANTLR for that in theory, but it would be very hard to make that work reliably.

Chris Lercher
What has ANTLR to do with regular expressions?
Bart Kiers
@Bart: Regexes can parse Chomsky type 3 grammars. ANTLR can additionally parse Chomsky type 2 (context free). It can kick in, where regexes aren't powerful enough anymore. So if you need to do something very complex to the URL - and that's the way I had (mis-?)understood the original version of the question - it could be necessary. Also, even if you use ANTLR to just parse regular languages, it can be a lot cleaner than regexes, because the notation is BNF-like. Using ANTLR requires much more overhead of course, but to replace very complex regexes, it's absolutely worth to consider it!
Chris Lercher
@Bart: Of course, after the update of the question (seeing that the author just wants to prepend foo.jpg with something), probably ANTLR won't be necessary... :-)
Chris Lercher