tags:

views:

53

answers:

1

I am trying to use Scanner to break up a string that i read in from file. File data is:

RFH ^@^@^@^B^@^@^@°^@^@^A^Q^@^@^D¸    
^@^@^@^@^@^@^D¸^@^@^@
<mcd><Msd>jms_bytes</Msd></mcd>
 ^@^@^@d<jms><Dst>queue:///panddArchiveVerifyStep1.V001_I</Dst><Tms>1280138410102</Tms><Dlv>2</Dlv>< /jms>  571:8:*SYD01_P,31:*panddArchiveVerifyStep1.V001_I,520:454:28:panddArchiveVerifyStep1.V001,417:<?xml version="1.0" encoding="UTF-8"?> <n0:message xmlns:n0="uri:ebusiness.com"><n0:messageHeader><n0:messageType>panddArchiveVerify</n0:messageType><n0:messageVersion>001</n0:messageVersion></n0:messageHeader><n0:archiveDoc><n0:docImageID>14256448</n0:docImageID><n0:initialDispatchDatetime>2010-06-16T20:40:48.495</n0:initialDispatchDatetime><n0:processCount>0</n0:processCount></n0:archiveDoc></n0:message>,,4:cert,16:dummycertificate,4:algo,3:DES,3:sig,9:[B@7b3082,0:,,,

The steps i need to do are: Get length from :28 to </n0:message> Prefix this length to before :28 and get rid of the rest

is there a regular expression i can call to get string token from :28 to </n0:message>?

So far i have a delimiter to get string token from :28 but i dont know how to stop at </n0:message>.

Scanner s = new Scanner(rawMsg.toString()).useDelimiter("(?=:28)");    
Example data  
:28:panddArchiveVerifyStep1.V001,417:<?xml version="1.0" encoding="UTF-8"?><n0:message xmlns:n0="uri:ebusiness.com......  

All i want from my raw data is three tokens:

Token One: 
RFH ^@^@^@^B^@^@^@°^@^@^A^Q^@^@^D¸       
 ^@^@^@^@^@^@^D¸^@^@^@
 <mcd><Msd>jms_bytes</Msd></mcd>
 ^@^@^@d<jms><Dst>queue:///panddArchiveVerifyStep1.V001_I</Dst><Tms>1280138410102</Tms><Dlv>2</Dlv>< /jms>  
 571:8:*SYD01_P,31:*panddArchiveVerifyStep1.V001_I,520:454

Token two:
:28:panddArchiveVerifyStep1.V001,417:<?xml
 version="1.0" encoding="UTF-8"?>
 <n0:message
 xmlns:n0="uri:ebusiness.asic.gov.au"><n0:messageHeader><n0:messageType>panddArchiveVerify</n0:messageType><n0:messageVersion>001</n0:messageVersion></n0:messageHeader><n0:archiveDoc><
n0:docImageID>14256448</n0:docImageID><n0:initialDispatchDatetime>2010-06-16T20:40:48.495</n0:initialDispatchDatetime><n0:processCount>0</n0:processCount></n0:archiveDoc></n0:message>

Token three:
,,4:cert,16:
dummycertificate,4:algo,3:DES,3:sig,9:[B@7b3082,0:,,,
+1  A: 

You're on the right track; Just use a lookbehind for the next delimiter:

if ( sc.useDelimiter("(?=:28)").hasNext() )
{
  System.out.printf("%n%s%n",  sc.next() );
}
if ( sc.useDelimiter("(?<=</n0:message>)").hasNext() )
{
  System.out.printf("%n%s%n",  sc.next() );
}
if ( sc.useDelimiter("\\z").hasNext() )
{
  System.out.printf("%n%s%n",  sc.next() );
}

But if you've already read the text into a String, it would probably be easier use Matcher.find() or String.split(), or even indexOf() and substring(). I'll elaborate if you're interested.

Alan Moore
Thanks Alan, that worked perfectly. Can you explain a bit further the look behind and the regular expressions? I understand the first delimiter, not too include :28, so that gives me the first token. So for the second, i dont understand that reg exp or the last one.
shane lee
Actually i understand the look behind and \\z which means the end of the string but why do i use ?= for first delimiter and ?<= for the second?Thanks,Shane.
shane lee
If you were to use `(?=</n0:message>)` as the second delimiter, it would match the position *before* the `</n0:message>`, which would become part of the third token. You said you wanted it to stay in the second token, so I used a lookbehind to match the position right *after* it instead.
Alan Moore
Yes your correct Alan, i just did not understand that expression ?<=. So this value means it matches the position after it, whereas ?= matches position before?
shane lee
Yep. This might interest you: http://www.regular-expressions.info/lookaround.html
Alan Moore
Thanks Alan for your help mate.
shane lee