views:

655

answers:

4

I am writing a program in Java that takes a custom XML file and parses it. I'm using the XML file for storage. I am getting the following error in Eclipse.

[Fatal Error] :1:1: Content is not allowed in prolog.
org.xml.sax.SAXParseException: Content is not allowed in prolog.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:239)
    at     com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:283  )
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:208)
    at me.ericso.psusoc.RequirementSatisfier.parseXML(RequirementSatisfier.java:61)
    at me.ericso.psusoc.RequirementSatisfier.getCourses(RequirementSatisfier.java:35)
    at     me.ericso.psusoc.programs.RequirementSatisfierProgram.main(RequirementSatisfierProgram.java:23  )

The beginning of the XML file is included:

<?xml version="1.0" ?>
<PSU>
     <Major id="IST">
        <name>Information Science and Technology</name>
        <degree>B.S.</degree>
        <option> Information Systems: Design and Development Option</option>
        <requirements>
            <firstlevel type="General_Education" credits="45">
                <component type="Writing_Speaking">GWS</component>
                <component type="Quantification">GQ</component>

The program is able to read in the XML file but when I call DocumentBuilder.parse(XMLFile) to get a parsed org.w3c.dom.Document, I get the error above.

It doesn't seem to me that I have invalid content in the prolog of my XML file. I can't figure out what is wrong. Please help. Thanks.

+1  A: 

The document looks fine to me but I suspect that it contains invisible characters. Open it in a hex editor to check that there really isn't anything before the very first "<". Make sure the spaces in the XML header are spaces. Maybe delete the space before "?>". Check which line breaks are used.

Make sure the document is proper UTF-8. Some windows editors save the document as UTF-16 (i.e. every second byte is 0).

Aaron Digulla
I've been editing the XML file in Eclipse text editor. I'm on a Mac and I also use BBEdit. I'll check for invisible characters.
thechiman
I checked for invisible characters in BBEdit (View > Text Display > Show Invisibles) and I don't see any invisible characters in the XML declaration. I also deleted the whitespace at the end of the declaration.I added encoding="UTF-8" and encoding="UTF-16 and I'm still getting the error.
thechiman
What is the encoding of the file? i.e. not what you think but what does your editor say?
Aaron Digulla
Also make sure that you're actually looking at the file which causes the error!
Aaron Digulla
I checked the encoding type in BBEdit; it is UTF-16.I'm pretty sure I'm looking at the right file. The following is my code for reading in the file and parsing it:File f = new File("/Users/thechiman/Dropbox/introcs/PSU SOC Crawler/src/resources"); //Check to see if file exists if(f.exists()) { System.out.println("file exists"); } else { System.out.println("file does not exist"); } //Use factory to get a new DocumentBuilder DocumentBuilder db = dbf.newDocumentBuilder(); //Parse the XML file, get DOM representation this.dom = db.parse(f);
thechiman
Well, the parser expects UTF-8 and your file is UTF-16. This means the first byte of the file is 0 and you get the error. Save the file with the correct encoding (UTF-8) to fix the problem.
Aaron Digulla
I saved the file as UTF-8 and UTF-8, No BOM. Both times I get the same error.
thechiman
In that case, you're editing a different file than the parser reads.
Aaron Digulla
+1  A: 

Make sure there's no hidden whitespace at the start of your XML file. Also maybe include encoding="UTF-8" (or 16? No clue) in the node.

Ben J
This is unfortunately most likely the cause.
Esko
Checked in BBEdit for hidden characters and added the encoding attribute to the XML declaration. Both didn't fix it.
thechiman
A: 

If you're able to control the xml file, try adding a bit more information to the beginning of the file:

<?xml version="1.0" encoding="UTF-16" standalone="no"?>
Drew Johnson
I've added both standalone="no" and standalone="yes". Both give me the same error.
thechiman
hmmm...the next thing I'd try is brute force - try to get a dummy document through the parser, then slowly add parts of your original document until you can identify the problem. I've been down that road before :-)
Drew Johnson
A: 

Check any syntax problem in the XMl file. I've found this error when working on xsl/xsp with Cocoon and I define a variable using a non-existing node or something like that. Check the whole XML.

Alfabravo
I get the error before I can do anything with the parsed document. It's failing when I call DocumentBuilder.parse(XMLFile). I ran the XML file through an XML validator (xmlvalidation.com) and it went through just fine.
thechiman
Is the file available in the specified location? Maybe your program can't access the content of the file and the parser just says what it founds is not xml valid... just guessing.
Alfabravo