views:

33

answers:

1

It seems like this question has come up before as I see in http://stackoverflow.com/questions/2938398/reading-escape-characters-with-xmlstreamreader

But the issue I am seeing here is little different.

I am reading a pretty big XML file which contains a large snippet of malformed html as one of the tag values. The values are enclosed in CDATA and normally they do not cause any issue. But intermittently, getText method of XMLSTreamReader class reads only half of the text in this CDATA and the first character in next batch is as an exmaple : "<table>" which the parser treats as Start node instead of Character causing the parsing to fail.

Has anyone encountered this issue with Stax parser before. I am using sjsxp1.0.1 implementation on jdk1.,5

Any help or wild ideas would be appreciated as I am out of all ideas now.

A: 

I think I made some head way with the issue. The problem seems to be in sjsxp implementation (even there latest one). Sometimes getText method does not read the entire text and if you are as unlucky as me you would encounter a tag and that would cause the problem. We were planning to encode the values which might work, but we also tried the woodstox implementation (http://woodstox.codehaus.org) and that seems to handle this case. So wanted to ask a follow up question it

Has anyone else used Stax implementation of Woodstox and knows if there are any issues compared to sjsxp?

Fazal