tags:

views:

472

answers:

5

I have to encode the 5 XML reserved chars (& < > " and ') properly as follows:

"&", "&amp;" "<", "&lt;" ">", "&gt;"
   "\"", "&quot;" "\'", "&apos;"

I can do them one by one, but is it possible in regexp something like ("[&|<|>|\"|\']", "&|<"); ... etc so that it will not be executed in 5 operations one after another but alltogether simultaneously?

by the way, Possibly in Java using String.replaceAll(regexpString, string);

+1  A: 

See code at http://www.owasp.org/index.php/How_to_perform_HTML_entity_encoding_in_Java.

Jakarta also has a commonly-used escapeXml method which will cover it, though if you're producing HTML-compatible XHTML, those ‘&apos;’ entities are undesirable.

IMO the ‘simultaneous’ regex doesn't really get you a great deal when you are only doing single-character replaces anyway.

bobince
+5  A: 

Use StringEscapeUtils.escapeXml in commons-lang library.

BTW, I never start a Java progress without adding almost all of the commons library to my dependencies. They save loooooooots of time..

<dependency>
    <groupId>commons-lang</groupId>
    <artifactId>commons-lang</artifactId>
    <version>2.4</version>
</dependency>
flybywire
yli
A: 

I wonder if you'd be better off just wrapping the data containing "magic characters" in a CDATA section and calling it a day. Have the client strip it off when they receive it.

duffymo
A: 

If you're doing this in order to insert some data into an XML packet, you would be much better off using an actual XML API, which will encode these for you.

Chase Seibert
A: 
protected static void escapeXMLSpecialCharactersAndWrite(Writer writer, String str) throws IOException {

    int len = str.length();
    for (int i = 0; i < len; i++) {
        char c = str.charAt(i);

        switch (c) {
        case '&':
            writer.write("&amp;");
            break;
        case '<':
            writer.write("&lt;");
            break;
        case '>':
            writer.write("&gt;");
            break;
        case '\"':
            writer.write("&quot;");
            break;
        case '\'':
            writer.write("&apos;");
            break;
        default:
            writer.write(c);
            break;
        }
    }
}
yli
why would you write this instead of using the built in StringEscapeUtils.escapeXml ??
Jeff Atwood