views:

121

answers:

2

Hi,

I am trying to escape html characters of a string and use this string to build a DOM XML using parseXml method shown below. Next, I am trying to insert this DOM document into database. But, when I do that I am getting the following error:

org.xml.sax.SAXParseException: Reference is not allowed in prolog.

I have three questions: 1) I am not sure how to escape double quotes. I tried replaceAll("\"", """) and am not sure if this is right.

2) Suppose I want a string starting and ending with double quotes (eg: "sony"), how do I code it? I tried something like:

String sony = "\"sony\""

Is this right? Will the above string contain "sony" along with double quotes or is there another way of doing it?

3)I am not sure what the "org.xml.sax.SAXParseException: Reference is not allowed in prolog." error means. Can someone help me fix this?

Thanks, Sony

Steps in my code:

  1. Utils. java

    public static String escapeHtmlEntities(String s) { return s.replaceAll("&", "&").replaceAll("<", "<").replaceAll(">", ">").replaceAll("\"", """). replaceAll(":", ":").replaceAll("/", "/"); }

        public static Document parseXml (String xml) throws Exception { 
    
    
       DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document doc = builder.parse(new InputSource(new StringReader(xml)));
        doc.setXmlStandalone(false);
        return doc;
    }
    
  2. TreeController.java

    protected void notifyNewEntryCreated(String entryType) throws Exception { for (Listener l : treeControlListeners) l.newEntryCreated();

    final DomNodeTreeModel domModel = (DomNodeTreeModel) getModel();
    Element parent_item = getSelectedEntry();
    String xml = Utils.escapeHtmlEntities("<entry xmlns=" + "\"http://www.w3.org/2005/atom\"" + "xmlns:libx=" + 
            "\"http://libx.org/xml/libx2\"&gt;" + "<title>" + "New" + entryType + "</title>" +
            "<updated>2010-71-22T11:08:43z</updated>" + "<author> <name>LibX Team</name>" +
                "<uri>http://libx.org&lt;/uri&gt;" + "<email>[email protected]</email></author>" + 
                "<libx:" + entryType + "></libx:" + entryType + ">" + "</entry>");
    xmlModel.insertNewEntry(xml, getSelectedId());
    

    }

  3. XMLDataModel.java

public void insertNewEntry (String xml, String parent_id) throws Exception { insertNewEntry(Utils.parseXml(xml).getDocumentElement(), parent_id); }

public void insertNewEntry (Element elem, String parent_id) throws Exception {

    // inserting an entry with no libx: tag will create a storage leak
    if (elem.getElementsByTagName("libx:package").getLength() +
        elem.getElementsByTagName("libx:libapp").getLength() +
        elem.getElementsByTagName("libx:module").getLength() < 1) {
        // TODO: throw exception here instead of return
        return;
    }

    XQPreparedExpression xqp = Q.get("insert_new_entry.xq");
    xqp.bindNode(new QName("entry"), elem.getOwnerDocument(), null);
    xqp.bindString(new QName("parent_id"), parent_id, null);
    xqp.executeQuery();
    xqp.close();

    updateRoots();
}
  1. insert_new_entry.xq

declare namespace libx='http://libx.org/xml/libx2'; declare namespace atom='http://www.w3.org/2005/atom'; declare variable $entry as xs:anyAtomicType external; declare variable $parent_id as xs:string external; declare variable $feed as xs:anyAtomicType := doc('libx2_feed')/atom:feed; declare variable $metadata as xs:anyAtomicType := doc('libx2_meta')/metadata; let $curid := $metadata/curid return replace value of node $curid with data($curid) + 1, let $newid := data($metadata/curid) + 1 return insert node {$newid}{ $entry// } into $feed, let $newid := data($metadata/curid) + 1 return if ($parent_id = 'root') then () else insert node into $feed/atom:entry[atom:id=$parent_id]//(libx:module|libx:libapp|libx:package)

A: 

To escape a double quote, use the &quot; entity, which is predefined in XML.

So, your example string, say an attribute value, will look like

There is also &apos; for apostrophe/single quote.

I see you have lots of replaceAll calls, but the replacements seem to be the same? There are some other characters that cannot be used literally, but should be escaped:

  &  --> &amp;
  >  --> &gt;
  <  --> &lt;
  "  --> &quot;
  '  --> &apos;

(EDIT: ok, I see this is just formatting - the entities are being turned into they're actual values when being presented by SO.)

The SAX exception is the parser grumbling because of the invalid XML.

As well as escaping the text, you will need to ensure it adheres to the well-formedness rules of XML. There's quite a bit to get right, so it's often simpler to use a 3rd party library to write out the XML. For example, the XMLWriter in dom4j.

mdma
A: 

You can check out Tidy specification. its a spec released by w3c. Almost all recent languages have their own implementation.

rather than just replace or care only to < ,>, & just configure JTidy ( for java ) options and parse. this abstracts all the complication of Xml escape thing.

i have used both python , java and marklogic based tidy implementations. all solved my purposes

kadalamittai