tags:

views:

103

answers:

4

I am working on converting an excel spread sheet into an xml document that needs to be validated against a schema. I am currently building the xml document using the DOM api, and validating at the end using SAX and a custom error handler. However, I would really like to be able to validate the xml produced from each Cell as I parse the excel document so I can indicate which cells are problematic in a friendlier way.

The problem that I am currently encountering, is that after validating the xml for the simple types, once they are built into a complex type, all the children nodes get validated again, producing redundant errors.

I found this question here at SO but it is using C# and the Microsoft API.

Thoughts? Thanks!

A: 

Try building schemas at multiple levels of granularity. Test the simple (Cells) ones against the most granular, and the complex ones (Rows?) against a less granular schema that doesn't decompose the complex types.

Matthew Flynn
The schema actually consists of multiple XSDs and in total is quite large. I think I would only go this route as a last resort.
Casey
+1  A: 

You could try having your parsing code fire SAX events instead of directly constructing a DOM. Then you could just register a validating SAX ContentHandler to listen to it and have that build your DOM for you. That should detect validation errors as they're encountered.

hcayless
How can I use SAX to parse something that I haven't created yet? I'm using DOM to construct the XML as I am parsing the Excel file.
Casey
+2  A: 

Sorry, but I don't see the problem. You are producing the XML, so what's the point in validating the XML while you produce it?

Are you looking to validate the cell contents? If yes, then write validation logic into your code. This validation logic may replicate the schema, but I suspect that it will actually be much more detailed than the schema.

Are you looking to validate your program's output? If yes, then write unit tests.

kdgregory
I am looking to validate the cell contents against the XSD. The structure of the excel file mirrors that of the XSD.
Casey
+1  A: 

So the solution that I decided to go with and am almost finished implementing, was to use XSOM to parse the XSD. Than when parsing the Excel file, I looked up the column name in the parsed XSD to pull out the restrictions (since the column headers map to simple types in the XSD) and than did manual validation against the restrictions. I am still building the tree so that at the end of it I can validate the entire XML tree against the XSD since there are some things that I can't catch at the Cell level.

Thanks for all of your input.

Casey