views:

712

answers:

7

Among the data my application sends to a third-party SOA server are complex XMLs. The server owner does provide the XML schemas (.xsd) and, since the server rejects invalid XMLs with a meaningless message, I need to validate them locally before sending.

I could use a stand-alone XML schema validator but they are slow, mainly because of the time required to parse the schema files. So I wrote my own schema validator (in Java, if that matters) in the form of an HTTP Server which caches the already parsed schemas.

The problem is: many things can go wrong in the course of the validation process. Other than unexpected exceptions and successful validation:

  • the server may not find the schema file specified
  • the file specified may not be a valid schema file
  • the XML is invalid against the schema file

Since it's an HTTP Server I'd like to provide the client with meaningful status codes. Should the server answer with a 400 error (Bad request) for all the above cases? Or they have nothing to do with HTTP and it should answer 200 with a message in the body? Any other suggestion?

Update: the main application is written in Ruby, which doesn't have a good xml schema validation library, so a separate validation server is not over-engineering.

+1  A: 

I'd go with 400 Bad request and a more specific message in the body (possibly with a secondary error code in a header, like X-Parse-Error: 10451 for easier processing)

Piskvor
+2  A: 

From w3c: 400 = The request could not be understood by the server due to malformed syntax.

I wouldn't serve that up unless it was actually the case that the server could not understand the request. If you're just getting invalid xml, serve a 200 and explain why things are not working.

Regards Fake

A: 

That sounds like a neat idea, but the HTTP status codes don't really provide an "operation failed" case. I would return HTTP 200 with an X-Validation-Result: true/false header, using the body for any text or "reason" as necessary. Save the HTTP 4xx for HTTP-level errors, not application-level errors.

It's kind of a shame and a double-standard, though. Many applications use HTTP authentication, and they're able to return HTTP 401 Not Authorized or 403 Forbidden from the application level. It would be convenient and sensible to have some sort of blanket HTTP 4xx Request Rejected that you could use.

Tom
If you 200 and an extra header for the success, you are tunneling a protocol over HTTP, instead of using HTTP properly. Don't do that.
Julian Reschke
+2  A: 

It's a perfectly valid thinking to map error situations in the validation process to meaningful HTTP status codes.

I suppose you send the XML file to your validation server as a POST content using the URI to determine a specific schema for validation.

So here are some suggestions for error mappings:

  • 200: XML content is valid
  • 400: XML content was not well-formed, header were inconsistent, request did not match RFC 2616 syntax
  • 401: schema was not found in cache and server needs credentials to use for authentication against the 3rd party SOA backend in order to obtain the schema file
  • 404: Schema file not found
  • 409: the XML content was invalid against the specified schema
  • 412: Specified file was not a valid XMl schema
  • 500: any unexpected exception in your validation server (NullPointerExceptions et al.)
  • 502: the schema was not found in cache and the attempt to request it from the 3rd party SOA server failed.
  • 503: validation server is restarting
  • 504: see 502 with reason=timeout
mkoeller
Be careful how you use http statuses like 401, 409 and 412 - they have particular meaning in the HTTP protocol and aren't codes you can just decide to use in some generalised error scenario because you like the way the wording sounds :)'422 unprocessable entity' is probably what you're looking for as a general-purpose "while it was syntactically valid for its media type, we were unable to accommodate the semantics of the submitted request entity"http://tools.ietf.org/html/rfc4918#section-11.2
Matt
+1  A: 

Amazon could be used as a model for how to map http status codes to real application level conditions: http://docs.amazonwebservices.com/AWSImportExport/latest/API/index.html?Errors.html (see Amazon S3 Status Codes heading)

Danny Armstrong
A: 

Status code 422 ("Unprocessable Entity") sounds close enough:

"The 422 (Unprocessable Entity) status code means the server understands the content type of the request entity (hence a 415(Unsupported Media Type) status code is inappropriate), and the syntax of the request entity is correct (thus a 400 (Bad Request) status code is inappropriate) but was unable to process the contained instructions. For example, this error condition may occur if an XML request body contains well-formed (i.e., syntactically correct), but semantically erroneous, XML instructions."

Julian Reschke
+1  A: 

Say you're posting XML files to a resource, eg like so:

POST /validator Content-type: application/xml

If the request entity fails to parse as the media type it was submitted as (ie as application/xml), 400 Bad Request is the right status.

If it parses syntactically as the media type it was submitted as, but it doesn't validate against some desired schema, or otherwise has semantics which make it unprocessable by the resource it's submitted to - then 422 Unprocessable Entity is the best status (although you should probably accompany it by some more specific error information in the error response; also note it's technically defined in an extension to HTTP, WebDAV, although is quite widely used in HTTP APIs and more appropriate than any of the other HTTP error statuses when there's a semantic error with a submitted entity).

If it's being submitted as a media type which implies a particular schema on top of xml (eg as application/xhtml+xml) then you can use 400 Bad Request if it fails to validate against that schema. But if its media type is plain XML then I'd argue that the schema isn't part of the media type, although it's a bit of a grey area; if the xml file specifies its schema you could maybe interpret validation as being part of the syntactic requirements for application/xml.

If you're submitting the XML files via a multipart/form or application/x-www-form-urlencoded form submissions, then you'd have to use 422 Unprocessable Entity for all problems with the XML file; 400 would only be appropriate if there's a syntactic problem with the basic form upload.

Matt