views:

134

answers:

6

Hello

I need a quick help for a tricky problem that is literally driving me crazy.

String example = "<digitalObject>" +
                 "<title>title</title>" +
                 "<creator>Name</creator>" +
                 "<location>link</location>"+
                 "<relatedAsset>related realife object</relatedAsset>" +
                 "<note><src lang =\"en\">value</src></note>" +
                 "<archivalDate>date</archivalDate>"+
                 "<mimeFormat>mime type</mimeFormat>"+
                 "<digitalObjectOwner>owner</digitalObjectOwner>"+
                 "</digitalObject>";

String example4="<digitalObject>" + 
                "<title>title</title>"+
                "<creator>name</creator>"+ 
                "<location>link</location>"+
                "<relatedAsset>related realife object</relatedAsset>" + 
                "<note><src lang=\"en\">value</src></note>" + 
                "<archivialDate>date</archivialDate>"+
                "<mimeFormat>mime type</mimeFormat>" + 
                "<digitalObjectOwner>owner</digitalObjectOwner>" + 
                "</digitalObject>";

The following code to get a w3c.dom.Document object

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
Document doc=null;
try {
builder = factory.newDocumentBuilder();

InputSource is = new InputSource();
is.setCharacterStream(new StringReader(example4));
doc = builder.parse(is);
    } catch (SAXException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    } catch (ParserConfigurationException e) {
        e.printStackTrace();
    }

    return doc;
}

The first string (example) is parsed correctly, the second one (example2) returns null.

Any idea why? I can't really see any difference between the 2!!

Thanks in advance I'm sure it's something so dumb I will feel ashamed...

EDIT: actually with the same content in the string, still 2 different outcome... I'll try to diff them...

+1  A: 

You're missing a </src> in the note tag in the second one.

Walter Mundt
Are you sure? It looks self-closed to me.
Kirk Woll
It's self-closed BUT there is some text after it for some reason: `<note><src lang=\"en\" />wer</note>`. What's that "wer" for?
Matt Kane
@Matt, that is *perfectly* legal xml. It's analogous to the xhtml: <div><br/>Hello</div>.
Kirk Woll
It was self-closedx, I then added some content to check if that was the problem, and I forgot to open and close it. The fixed string is:
Also "archivialDate"? Misspelled I assume. Is there a DTD or XSD that's causing this to fail though? The tag and its closing tag have the same spelling error.
Matt Kane
String example4="<digitalObject>" + " <title>dfgdfg</title>"+ "<creator>dfgdfg</creator>"+ "<location>dfgdfg</location>"+ "<relatedAsset>dfsd</relatedAsset>" + "<note><src lang=\"en\">wer</src></note>" + "<archivialDate>ff</archivialDate>"+ "<mimeFormat>dfgdfg</mimeFormat>" + "<digitalObjectOwner>gfhfgh</digitalObjectOwner>" + "</digitalObject>";But it still doesn't work...
@user444540: Please provide a short but *complete* program that demonstrates the problem (and by editing your question, not in comments).
Jon Skeet
A: 

Let the computer do the work: http://commons.apache.org/lang/api-2.5/org/apache/commons/lang/StringUtils.html#difference(java.lang.String, java.lang.String)

If you expect the structure to be the same, but different contents, you should only see the content changes in the output.

Freiheit
On second read this is a little kludgy. It may *help* a manual check, but its hardly the best option.
Freiheit
A: 

Might not help this time, but I usually paste XML into an *.xml file in Eclipse, then auto-format the file (ctrl-shift-f), which makes things like this much, much easier to eyeball.

Dean J
Notepad++ is good for this too. http://notepad-plus.sourceforge.net
Freiheit
+2  A: 

Have you checked for invisible characters? I have found in the past that there are invisible characters in an xml that are different from what I am expecting to have sent.

aperkins
A: 

I knew I would be ashamed.

Spelling error, archivalDate in one and archiv*i*alDate in the other.

Plese bury me now...

Why didn't you just copy one and paste it into the other? This whole question is ridiculous.
Erick Robertson
I don't think the question is ridiculous if you consider the problem in more general terms. Consider it this way "My xml parser is set up to consume this sample properly ::insert sample 1:: . When I feed it another sample ::sample 2:: it fails. I suspect there is some minor difference between the samples. How do I find it?" Looking at inputs is not an efficient or easy way to find these things. The chosen parser reports problem but can't/won't report on WHERE the problem is. So crowdsourcing the review of the two samples is bad, but looking at the tools available to automate it is good.
Freiheit
I agree with user - I have had several times where we are trying to figure out why a client's xml code they are sending us does not work, even though our test cases do, and they are based off their xml they are sending us (which led to me always looking for invisible characters, btw)
aperkins
A: 

Hey! you are saying the second one (example2) are you sure you are parsing example4? and not something called example2 which could be null?

I tested your code, and I did not get any exception. If otherwise, I am pretty sure you are dealing with some node, that does not exist in second xml - example4 - such as archivalDate

ring bearer