views:

34

answers:

1

I'm trying to store a gzipped binary serialized object into Active Directory's "Extension Attribute", more info here. This field is a Unicode string according to it's oM syntax of 64.

I'm saving the binary object into AD's Unicode format like this:

byte[] bytes = ... // This is my blob 
System.Text.Encoding.Unicode.GetString(bytes); 

I then save it to extension attribute #14. The issue is that when I read the value I don't get my entire string back.

Here is a screenshot of what is actually saved to the server: alt text

Here is a screenshot of what comes back: alt text

I am guessing that the \0 is causing the problem, and that probably means null. How should I handle this? Are there other chars I should also be escaping besides null?

+1  A: 

I assume you're trying to put binary data into a string field.

Simply converting the data from binary to Unicode is somewhat of a bad idea (one of which is the reason you've encountered, but Null (0) isn't the only point in Unicode string encoding which may cause issues for you. There are other control characters, you might have byte pairs that point to characters that are reserved in Unicode, etc.)

I would recommend considering Base64 instead. It was designed for this exact purpose. While this probably hinders your compression efforts using gzip, it should solve your problem.

Your code will instead be something like:

byte[] bytes = ... // This is my blob 
System.Convert.ToBase64String(bytes); 

You then use:

System.Convert.ToBase64String(string); 

To get your data back as bytes.

This is definitely a safer approach than what you are doing.

userx
What is the best order to keep it small? GZIP then Unicode then Base64?
MakerOfThings7
@MakerOfThings7 - Presuming you are trying to get binary data into a string field, there is no reason to convert the data to Unicode. Doing so is largely unsafe (there are reserved character points in Unicode for example which two of your bytes might accidentally represent). I would GZIP (of Flate) compress your data. Then use Base64. I would not use Unicode at all.
userx
As a side note, Base64 is aimed at putting binary data into ASCII / UTF8 data fields. If you know for a fact that internal to AD the data is stored as UTF16 and not UTF8, it MAY be possible to instead design a new binary converter that makes better use of the fact that UTF16 uses 2 bytes and not 1 byte per character. Base64 though is very widely used and in most applications, I would typically just accept the storage hit for the sake of being on a standard everyone knows.
userx
@Userx, seems like the GZIP + Base64 doesn't shrink my data enough... I might as well leave it as a serialized XML string (my source)
MakerOfThings7
@MakerOfThings7 - Well to compress and store binary data as a string, I'm pretty GZIP (or Flate, or other compression scheme) + Base64 is the right answer :) If your source is XML, depending upon what features of XML you're using (no namespaces, etc), you may wish to consider converting it to JSON which should be safe to store in a string as well.
userx
@Userx, great point about JSON. I rarely use that format from .NET unless I'm using WCF. Do you know the right class to use in this case?
MakerOfThings7
[`DataContractJsonSerializer`](http://msdn.microsoft.com/en-us/library/system.runtime.serialization.json.datacontractjsonserializer.aspx)
Timwi