views:

125

answers:

5

I've got a .txt file given to me to parse through to pull out certain information and i'm not really wanting to write a scanner to do this. It resembles ANSI to me, with maybe a little more added to it. I don't know. It's automatic output from some hardware that's years and years old. Here is some more just to get a good idea of what i'm dealing with and what the output needs to look like.

<ESC>[00p<ESC>(1*259*01/26/10*11.05*<CR>
<ESC>[05pEJ LOG COPIED OK 247C0200       <CR>
<FF><ESC>[05p*3094*1*R*09<CR>
<ESC>[00p<ESC>(1*260*01/26/10*11.07*<CR>
<ESC>[05pSUPERVISOR MODE EXIT            <CR>

Expected output:

*259*01/26/10*11.05*
EJ LOG COPIED OK 247C0200       
*3094*1*R*09
*260*01/26/10*11.07*
SUPERVISOR MODE EXIT    

Like I said, This is just a little bit in pages and pages of it. Could be ANSI I'm not definite. If I've left out some critical info let me know. I'm coding in C# btw. I would include the name/model of the device but I don't know it. Thanks!

+2  A: 

That looks like to me a Electronic Journal of some cash register machine - where the log of the sales transactions were downloaded from...not sure which machine though - some of them are capable of being communicated via serial, by using the escape codes to trigger the opening of the log from the Electronic Journal - I am reasoning it, as I have seen EJ being used...could have been a Samsung Cash register....

Hope this helps, Best regards, Tom.

tommieb75
A: 

Try something like this:

string input = @"
    <ESC>[00p<ESC>(1*259*01/26/10*11.05*<CR>
    <ESC>[05pEJ LOG COPIED OK 247C0200       <CR>
    <FF><ESC>[05p*3094*1*R*09<CR>
    <ESC>[00p<ESC>(1*260*01/26/10*11.07*<CR>
    <ESC>[05pSUPERVISOR MODE EXIT            <CR>";
foreach (Match m in Regex.Matches(input, 
    @"(?:(?:<FF>)?(?:<ESC>[\[\(](?:\d{2}p|\d\*))+)(?<output>.*)",
    RegexOptions.Multiline))
{
    Console.WriteLine(m.Groups["output"].Value);
}

You'll need to replace:

  • <ESC> by \x1B
  • <FF> by \xFF
  • <CR> by \x0D
Rubens Farias
Ok so i've never seen an alternative like this before. Where can I read more about this.
zjazzmanz
That's named _regular expressions_; you can find more info at http://www.regular-expressions.info/
Rubens Farias
A: 

It looks as thought most of the 'tags' are the same. If it's a one time shot, you could just do a search/replace in a text editor to remove <ESC>, <CR>, [00p, <FF> and [05p rather than writing code to do it? Of course you only showed a snippet so perhaps there are a ton of different tags to remove...

KP
well there appears to be about 8 different tags that are re-occurring through the file. I'd love to be able to do search and replace but i'm afraid i'm forced to do it programatically. Thanks for the ideas though.
zjazzmanz
+1  A: 

This is message for TELOCATOR ALPHANUMERIC PROTOCOL (TAP).

You can read it's description in this document or in the following article.

Li0liQ
A: 

This looks to me being very similar to ANSI Escape sequences. Searching for it will give you plenty of results. This paper might give you further insight in the ANSI standards.

What you are looking for is a parser which can read those code sequences. Here is a parser written in C which claims to remove the control sequences from an ANSI sequence input. Maybe you want to give it a try.

MicSim