views:

209

answers:

6

I have to maintain code from a contractor. It has this "interesting" snippet:

String webServicesValue = 
  webResponse.substring(webResponse.indexOf("<" + fieldName + ">") + 
                        fieldName.length() + 2, 
                        webResponse.indexOf("</" + fieldName + ">"));

It took me a few minutes to understand what he's trying to do.

It seems to be a WTF code, but my colleague suggested, "If it ain't broken, don't fix it." I'm wondering if this code will ever fail. It seems to be working so far, and I honestly can't think of a test case to fail this.

Thanks,

+1  A: 

I would suggest re-writing it in that case. If it is not easy to understand, then it is more difficult to test.

Paul Lydon
+1 Agreed. Any single line code that takes minutes to understand is a problem for maintenance and should therefore be rewritten. (And in this case, the code clearly IS "broke"!)
Stephen C
How should I rewrite it? I was thinking about using a proper XML parser instead of this hack job, but, as mentioned, my colleague (who has more experience and seniority) told me not to fix it until proven broken (i.e., a proper XML parser is needed).
gineer
@gineer: That's a hard question. A proper XML parser is likely to give you a more robust solution, but if you are GUARANTEED that the XML will be consistently formatted, well-formed, and conforming to the schema, ad-hoc parsing might be good enough. But even an extra space in an unexpected place can be enough throw off ad-hoc parsing, and there are theoretical variations that you probably wouldn't even consider.
Stephen C
@gineer: After further thought, I'd say that you should use a proper XML parser UNLESS you can guarantee that you will never get malformed or differently formatted XML. If this is a web service, don't forget that someone might send you bad / weird XML just to see if they can "break" your service.
Stephen C
+6  A: 

What happens if you receive <fieldname/> ?

Igor Brejc
This is a valid test case. I'm going to discuss this with my colleague. However, from your experience, is it likely for web service to generate such tags? I've used only one XML library so far, and it always generates <fieldname></fieldname> for empty tags.
gineer
Since I don't know what kind of web service it is, I cannot tell you anything other than "it is likely, since it's valid XML" :). But in general, it is wise to code defensively: since the web service is an outside entity for you, you never know what kind of stuff it could return. Fixing a few lines of code now is much less expensive than later fixing possible bugs in the production. So I would suggest writing few unit tests around this code before you change it, and then refactor the code to be less error-prone. AND keep those unit tests running for the future.
Igor Brejc
@gineer: "However, from your experience, is it likely for web service to generate such tags?". Try fiddling with the config options and you can probably get it to generate different-looking XML. Besides, how do you know that the XML was generated by an XML library? If your code is parsing XML by hand, the other end could be generating it by hand!
Stephen C
+7  A: 

Yes. It will throw an Exception if "<fieldname>" is not present in the response. Specifically it will try to call webResponse.substring(fieldName.length() + 1, -1);

"<fieldname/>" will cause similar problems, as will any attributes on the element.

And if you get "<fieldname> ... <fieldname> ... </fieldName> ... </fieldName>", you'll get the wrong answer.

EDIT: in the light of followup discussions, I'd say that this code should be rewritten to use a proper XML parser ... unless you / your team can guarantee that the code will never have to deal with problematic XML. XML simply allows too many valid (and invalid) variations to deal with by ad-hoc string manipulation.

Stephen C
Of course, you might argue that these inputs are "impossible" ... and that's probably what the contractor thought. But "impossible" things may start happening when you change the system.
Stephen C
+1  A: 
  • Nested tags. You'll start at the first open tag, skip any others that exist and stop on the first close tag instead of the matching close tag.
  • There is a close tag somewhere before the open tag (you're searching for the close tag from the beginning of the string, not the end of the open tag)
Spyder
+1 for the out-of order example -- that's nasty. :)
David Moles
+2  A: 

Instead of manually parsing XML it's better to use a real XML parser. There are all kinds of corner cases that are hard to cover with plain string manipulation. It will be more readable with a real parser also. It's best to consider XML data to be binary data, especially when taking all the possible character encodings into consideration.

Esko Luontola
+2  A: 

In addition to Igor Brejc and Stephen C's responses above, there's CDATA:

<fieldname><![CDATA[ I am not really </fieldname> ]]></fieldname>

or even

<othertag>
  <![CDATA[ I am not really <fieldname> and there is no closing tag ]]>
</othertag>
David Moles
See my comment "... and there are theoretical variations that you probably wouldn't even consider". It applies to me too :-)
Stephen C