views:

26

answers:

3

I have an embedded device which runs Java applications which can among other things serve up XHTML web pages (I could write the pages as something other than XHTML, but I'm aiming for that for now).

When a request for a web page handled by my application is received a method is called in my code with all the information on the request including an output stream to display the page.

On one of my pages I would like to display a (log) file, which can be up to 1 MB in size.

I can display this file unescaped using the following code:

final PrintWriter writer; // Is initialized to a PrintWriter writing to the output stream.
final FileInputStream fis = new FileInputStream(file);
final InputStreamReader inputStreamReader = new InputStreamReader(fis);
try {
    writer.println("<div id=\"log\" style=\"white-space: pre-wrap; word-wrap: break-word\">");
    writer.println("    <pre>");
    int length;
    char[] buffer = new char[1024];
    while ((length = inputStreamReader.read(buffer)) != -1) {
        writer.write(buffer, 0, length);
    }
    writer.println("    </pre>");
    writer.println("</div>");
} finally {
    if (inputStreamReader != null) {
        inputStreamReader.close();
    }
}

This works reasonably well, and displays the entire file within a second or two (an acceptable timeframe).

This file can (and in practice, does) contain characters which are invalid XHTML, most commonly <>. So I need to find a way to escape these characters.

The first thing I tried was a CDATA section, but as documented here they do not display correctly in IE8.

The second thing I tried was a method like the following:

// Based on code: http://stackoverflow.com/questions/439298/best-way-to-encode-text-data-for-xml-in-java/440296#440296
// Modified to write directly to the stream to avoid creating extra objects.
private static void writeXmlEscaped(PrintWriter writer, char[] buffer, int offset, int length) {
    for (int i = offset; i < length; i++) {
        char ch = buffer[i];

        boolean controlCharacter = ch < 32;
        boolean unicodeButNotAscii = ch > 126;
        boolean characterWithSpecialMeaningInXML = ch == '<' || ch == '&' || ch == '>';

        if (characterWithSpecialMeaningInXML || unicodeButNotAscii || controlCharacter) {
            writer.write("&#" + (int) ch + ";");
        } else {
            writer.write(ch);
        }
    }
}

This correctly escapes the characters (I was going to expand it to escape HTML invalid characters if needed), but the web page then takes 15+ seconds to display and other resources on the page (images, css stylesheet) intermittently fail to load (I believe due to the requests for them timing out because the processor is pegged).

I've tried using a BufferedWriter in front of the PrintWriter as well as changing the buffer size (both for reading the file and for the BufferedWriter) in various ways, with no improvement.

Is there a way to escape all XHTML invalid characters that does not require iterating over every single character in the stream? Failing that is there a way to speed up my code enough to display these files within a couple seconds?

I'll consider reducing the size of the log files if I have to, but I was hoping to make them at least 250-500 KB in size (with 1 MB being ideal).

I already have a method to simply download the log files, but I would like to display them in browser as well for simple troubleshooting/perusal.

If there's a way to set the headers so that IE8/Firefox will simply display the file in browser as a text file I would consider that as an alternative (and have an entire page dedicated to the file with no XHTML of any kind).


EDIT:

After making the change suggested by Cameron Skinner and performance testing it looks like the escaped writing takes about 1.5-2x as long as the block-written version. It's not nothing, but I'm probably not going to be able to get a huge speedup by messing with it.

I may just need to reduce the max size of the log file.

+1  A: 

You can try StringEscapeUtils from commons-lang:

StringEscapeUtils.escapeHtml(writer, string);
Bozho
Is that likely to be faster than looping through and replacing them myself? Keep in mind that it requires converting the `char[]` I have into a string first. I would prefer to avoid dependencies as they would have to be bundled directly into my jar (keep in mind this is an embedded device).
Lawrence Johnston
You should first measure *where* your program is slow. It may be either the processing of the characters, or the repeated calls to `write`. Try the following things: • write 10000 times 100 bytes. • write 100 times 10000 bytes. • write 1000000 times 1 byte.
Roland Illig
Performance data added to question.
Lawrence Johnston
+1  A: 

One option is for you to serve up the log contents inside of an iframe hosted inside of your web page. The iframe's source could point to a URL that serves up the content as text.

Jacob
+1  A: 

One small change that will (well, might) significantly increase the speed is to change

writer.write("&#" + (int) ch + ";");

to

writer.write("&#");
writer.write((int)ch);
writer.write(";");

String concatenation is extremely expensive as Java allocates a new temporary string buffer for each + operator, so you are generating two temporary buffers each time there is a character that needs replacing.

EDIT: One of the comments on another answer is highly relevant: find where the slow bit is first. I'd suggest testing logs that have no characters to be escaped and many characters to be escaped.

I think you should make the suggested change anyway because it costs you only a few seconds of your time.

Cameron Skinner
Thanks for the suggestion. I'm accepting it as based on performance testing it appear I'm not likely to be able to increase the speed that much more.
Lawrence Johnston