views:

208

answers:

2

I need remove non-xml tags from file generated by another program.

The file is some like this:

Executing Command - Blah.exe ...
-----Command Output-----
HTTP/1.1 200 OK
Connection: close
Content-Type: text/xml

<?xml version="1.0"?>
<testResults>
  <finalCounts>
    <right>7</right>
    <wrong>4</wrong>
    <ignores>0</ignores>
    <exceptions>0</exceptions>
  </finalCounts>
</testResults>

Exit-Code: 15

How to remove the non-xml text easily in java?

+7  A: 
// getContent() returns the complete text to strip.
//
String s = getContent();

// Find the start of the XML content using the <?xml prefix.
//
int xmlIndex = s.indexOf( "<?xml" );

// Strip the non-XML header.
//
s = s.substring( xmlIndex );

// Find the last closing angle-bracket; should indicate end of the XML.
//
xmlIndex = s.lastIndexOf( ">" );

// Strip everything after the closing angle-bracket.
//
s = s.substring( 0, xmlIndex );
Dave Jarvis
You might need to add or substract 1 from `xmlIndex`.
Dave Jarvis
+4  A: 

This looks like direct HTTP output... so just scanning for the first two consecutive line feeds (probably with carriage returns in front of them) will give you the end of the prefix you want to filter out.

ndim
A pity there isn't a `Content-Length` header to provide more hints.
McDowell