ansaurus

Question

Answer 1

A:

If the null characters are used as right padding (i.e. terminating) the text, which would be the normal case, this is fairly easy:

Dim strText As String = encASCII.GetString(byText)
Dim strlen As Integer = strText.IndexOf(Chr(0))
If strlen <> -1 Then
    strText = strText.Substr(0, strlen - 1)
End If

If not, you can still do a normal Replace on the string. It would be slightly “cleaner” if you did the pruning in the byte array, before converting it to a string. The principle remains the same, though.

Dim strlen As Integer = Array.IndexOf(byText, 0)
If strlen = -1 Then
    strlen = byText.Length + 1
End If
Dim strText = encASCII.GetString(byText, 0, strlen - 1)

Konrad Rudolph 2009-08-30 07:32:31

Thanks for the effort, however This does not seem to work for me (I tried the second code listing)The 00 characters are not just at the end of the file When looking in a Hex editor, I see 00 in the place of the bad characters. They are interdispersed in several spots through the string "20 43 68 61 72 67 65 00 00 00 00 00 00 67 65 00 00 00"I used your code and the characters remained.

Paul 2009-08-30 08:29:05

unknown, are you sure it was written as ASCII?

Henk Holterman 2009-08-30 09:21:49

Well in that place, you can simply use `String.Replace`. However, Henk is right: your data most probably isn’t ASCII-encoded in the first place. You should definitely try to get more information on the input data.

Konrad Rudolph 2009-08-30 09:41:29

Konrad, I was able to do a strText.replace (Chr(0)," ") to get rid of "Most" of the offending characters.However I am now stuck with a single bad character "error: illegal character 0x19". It does not go away with chr(19), any other suggestions.

Paul 2009-08-30 16:37:31

Henk, The file is a binary file, that I am trying to load into a database. I want to strip out any binary characters and load just the plain ascii text (well at least on the text fields). the fields are fixed width, however the text fiels seem to contain some binary garbage and also the afore mentioned null padding. However I want to get rid of that. I had been using Char.IsLetterOrDigit() to weed out bad characters, however that is too general and takes out symbols that I need to keep in the text, so now I am trying to replace only the bad chars individually.

Paul 2009-08-30 16:40:57

unknown if the text (part) was written as UTF-8 then those binaries aren't garbage but escape codes.

Henk Holterman 2009-08-30 19:02:02

Henk, I beleive that you are correct that the input is UTF, however If that is the case, then how can I get rid of those escape characters (as I need to write to XML and those escape codes are not valid in XML, and are of no use to me for my application of the data).This code gets rid of all of them except for 0x19? Dim ascii As Encoding = Encoding.ASCII Dim [unicode] As Encoding = Encoding.Unicode Dim asciiBytes As Byte() = Encoding.Convert([unicode], ascii, unicodeBytes)

Paul 2009-08-30 19:54:00

Konrad, The input data was created by user input in a legacy application where anyone could have posted data to these text fields and could essentially contain any characters.

Paul 2009-08-30 19:55:20

Answer 2

+3 A:

First of all you should find out what the format for the text is, so that you are just blindly removing something without knowing what you hit.

Depending on the format, you use different methods to remove the characters.

To remove only the zero characters:

Dim len As Integer = 0
For pos As Integer = 0 To byText.Length - 1
   If byText(pos) <> 0 Then
      byText(len) = byText(pos)
      len += 1
   End If
Next
strText = Encoding.ASCII.GetChars(byText, 0, len)

To remove everything from the first zero character to the end of the array:

Dim len As Integer
While len < byText.Length AndAlso byText(len) <> 0
   len += 1
End While
strText = Encoding.ASCII.GetChars(byText, 0, len)

Edit:
If you just want to keep any junk that happens to be ASCII characters:

Dim len As Integer = 0
For pos As Integer = 0 To byText.Length - 1
   If byText(pos) >= 32 And byText(pos) <= 127 Then
      byText(len) = byText(pos)
      len += 1
   End If
Next
strText = Encoding.ASCII.GetChars(byText, 0, len)

Guffa 2009-08-30 09:32:18

Guffa, I am looking to keep only ASCII valid characters.There is no rhyme or reason to what characters are in there because the legacy app allowed for users to cut and paste into that field, and some were copying in word docs, etc.I need to serialize to XML, so I beleive that I need to be valid ASCII.

Paul 2009-08-30 20:46:40

I see. I added another option above that might be useful.

Guffa 2009-08-30 23:11:40

Guffa, This last bit did the trick.Thank you and Thanks to all who helped.

Paul 2009-08-31 01:40:14

Answer 3

A:

You can use a struct to load the data:

[System.Runtime.InteropServices.StructLayout(System.Runtime.InteropServices.LayoutKind.Explicit)]
internal struct TextFileRecord
{
    [System.Runtime.InteropServices.FieldOffset(0)]
    public byte Category;
    [System.Runtime.InteropServices.FieldOffset( 1 )]
    public byte Code;
    [System.Runtime.InteropServices.FieldOffset( 2 )]
    [System.Runtime.InteropServices.MarshalAs(System.Runtime.InteropServices.UnmanagedType.LPTStr, SizeConst=60)]
    public string Text;
}

You have to adjust the UnmanagedType-Argument to fit with your string encoding.

PVitt 2009-08-30 09:59:14

ansaurus

tags:

views:

answers:

0x00 in a binary file VB.NET

related questions