I have varying familiarity with what I consider are The Big Three open source proxy servers for *nix systems, and each has their own approach to the kind of functionality you are asking for, although I must say I've never done this myself.
- Squid: Very mature and performant, although single threaded
- Apache httpd with mod_proxy: what I'm using now for reverse proxy work
- Varnish Cache: The new kid on the block. Very cool and interesting, but arguably not stable enough for mission critical production systems
BUT, they are each very C/*nix/systems-oriented. So it's pretty straight forward although detailed work to create custom directives or filters or whatever each project might call their approach. But I'd not think any of them would allow for decent, straightforward, fast Java integration. Perl? A C program? Sure...
If you are interested in having your proxy server only do this HTML work, and have no interest in the caching or authentication or whatever functionality that a proper caching server would provide, and your environment allows for it, you may want to consider a simple Java servlet approach:
- Your custom Java servlet in a servlet container, like Tomcat or Jetty or whatever, listens for requests,
- Uses a client library (like Jakarta's http client) to pass the request on the the destination server,
- Receives the response from the destination server, and modifies it,
- And then the servlet returns the modified response to the client.
I sure hope you aren't doing anything evil with this system. :P
The first approach seems more 'correct' to me, even with the Java integration issues. The second seems easier, especially if the available skill sets and libraries tie you into a Java-centric approach. Anyway, that is my two cents.