views:

1700

answers:

3

I working with some EBCDIC data that I need to parse and find some Hex values. The problem that I'm having is that it appears that I'm reading the file in with the incorrect encoding. I can see that my record begins with "!" (which is a x5A in EBCDIC) but when doing the conversion to hex it returns as a x21, which is the ASCII value for a "!".

I was hoping that there was a built-in method in the framework, but I'm afraid that I'm going to have to create a custom class to correctly map the EBCDIC character set.

Using fileInStream As New FileStream(inputFile, FileMode.Open, FileAccess.Read)
   Using bufferedInStream As New BufferedStream(fileInStream)
      Using reader As New StreamReader(bufferedInStream, Encoding.GetEncoding(37))
         While Not reader.EndOfStream
            Do While reader.Peek() >= 0
               Dim charArray(52) As Char
               reader.Read(charArray, 0, charArray.Length)

               For Each letter As Char In charArray
                  Dim value As Integer = Convert.ToInt16(letter)

                  Dim hexOut As String = [String].Format("{0:x}", value)
                  Debug.WriteLine(hexOut)
               Next
            Loop
         End While
      End Using
   End Using
End Using

Thanks!

+2  A: 

Yes, when you read the text data in as strings, it's storing it internally as Unicode. If you care about the binary values (i.e. the raw bytes) then don't decode it in the first place.

If you really need to do anything with a custom EBCDIC encoding, you can use my open source EBCDIC implementation - but I think you really just need to make up your mind as to whether you're treating this as binary data or text.

Jon Skeet
+2  A: 

Be careful reading AFP data that way. It is big-endian in both byte and bit order. You will need to account for that if you are treating it as binary data, such as parsing through the Structured Fields in a document.

R Ubben
The structured fields data is what I'm trying to get. Thanks for the input
Tom Alderman
+1  A: 

You can do it like this:

  1. Open the AFP file. Read the first 9 bytes.
  2. Byte 0 should be 0xD3 or 0x5A. Byte 1 and byte 2 will be the length of the SFI, including 8 of the 9 bytes you just read. It is big endian, so the length = byte1 * 256+byte2.
  3. Bytes 3, 4, and 5 is the Structured Field Identifier. If you're looking for printable text, look for PTX, (Presentation Text Element) 0xD3 0xEE 0x9B. Skip ahead length-8 and read the next 9 bytes if you didn't find it.
  4. If you did find a PTX, read length-8 bytes. Parsing through the control sequences to get to the text is a little tricky. The first will start with 0x2b 0xD3, a byte for the length, and byte for what kind of control sequence it is. If this byte is an odd number, the next control sequence will omit the 0x2B 0xD3 header, starting with the length byte instead. This is called "chaining" and was apparently introduced to drive programmers trying to parse this stuff insane.
  5. Skip ahead from the length byte length-1 and press on or just look for the next 0x2B 0xD3; the last control sequence will not be chained, and everything following to the end of the PTX will be EBCDIC. Use Jon Skeet's library (thanks, Jon) and look for the next PTX element.

Sorry I was long-winded. It is doable, but not simple.

R Ubben