views:

298

answers:

3

I am reading and writing Java Properties files in XML format. Many of the property value have HTML embedded, which developers wrap in [[CDATA elements like so:

<entry key="foo"><![CDATA[
    <b>bar</b>
]]></entry>

However, when I use the Java API to load these properties and later write them back to XML, it doesn't wrap these entries in CDATA elements, but rather escapes the tags, like so:

<entry key="foo">&lt;b&gt;bar&lt;/b&gt;</entry>

Are these two formats equivalent? Am I introducing any potential problems by replacing CDATA with escaped tags?

+2  A: 

Not equivalent, but the text value you get by calling getText() is the same.

However, I would suggest you to abandon Properties in favor of real XML parsed by JAXB - it's awesome, you'll like it.

Didn't found any nice one, so at least these:

Object -> XML: here

Sun's verbose tutorial: http://java.sun.com/webservices/docs/2.0/tutorial/doc/JAXBUsing.html

Ondra Žižka
These are for localization files, so I don't have any motivation to change the format, but thanks for the answer!
Mike Sickler
Or abandon the XML format and stick with .properties.
Thilo
Properties format is proprietary and only allows map semantics (even does not keep the order). Properties API is bad and cumbersome. XML is a wide-spread standard with hundreds of tools for it, and handles tree structures natively. Why should anyone use .properties for more than just the most basic read-only things which are already tied to them, like log4j config etc?
Ondra Žižka
Yeah, the XML format is better than .properties for translation, since you don't have to escape Unicode, and you don't have to worry about the file encoding getting messed up.
Mike Sickler
A: 

Yes you could be inducing some problems, depending on how the data is used.

For example if you use it in a HTML page, A<br>B will print as

A

B

But A&lt;br&gt;B will show as

A<br>B
Suraj Chandran
+1  A: 

When the files are loaded into memory in a Properties object there is no difference between the two formats you have shown, as Ondra Žižka an answer with. CDATA sections are a way to escape a block of text instead of escaping every character in it.

I would consider the non-xml property file format myself, you will continue to see the tags in the raw files, but newline characters would need to be escaped.

Sarah Happy