views:

41

answers:

2

Hello.

Some programming languages have support for strings, that are stored as folows:

For example, AnsiString type in Delphi. These strings are conveniently managed and one can think, that it is a good idea to use them as a container for binary data since there are some effective operations on concatenation, substring extraction etc.

Somehow I have a strong feeling, that using string type, even if it is binary safe, for storing binary data is ideologically wrong, but I can't find any strong arguments to defend this position.

For sure, in such languages as PHP where using arrays actually add tooo much overhead (each array member in PHP occupies about 50 bytes of memory because of hashed nature of arrays) you have no other option than to use strings as binary data containers. But as for Delphi or C++ (with it's std::string) I think that storing binary data in strings (for example, cipher encryption keys or any binary protocol buffer) is wrong if even you have technical possibility to do that.

What do you think? Is there any arguments against storing binary data in strings?

+1  A: 

Depends on the language. If it allows you to do everything you can with a string as, say, with an integer, why not store it in an int? otherwise, for the sake of concatenation and every other utility, it may seem and feel wrong but strings might be the only option.

OddCore
+1  A: 

Strings are designed to handle text and not binary data. As such, certain string implementations might take certain liberties and not store the data as you entered it (unicode conversions, for example).

EDIT: To clarify, the above comment, I wasn't talking about any specific language, but the fact that certain string implementations (in languages where strings are not simply char arrays) internally store the data differently, so even if you create the string from a byte array, internally it could be saved as double-byte array. Also, in a lot of languages strings are immutable, which is generally not what you want when dealing with raw data.

In any case, I can't think of any language that has decent string implementations but not a vector implementation. Why not use that instead as your container?

EDIT: True, most languages won't let you override operators for arrays/vectors, and for good reason (but that's a whole other discussion). But other than that, you should have everything you need, even if it is with a little less syntactic sugar.

Tal Pressman
Python 2.x has a decent string implementation, but it has no separate type for arbitrary-length bytestrings, and so the `str` type is often used for both (even though text strings should probably be the `unicode` type instead). As long as you don't try to convert your data to Unicode, you're safe. I'm pretty sure Perl has similar semantics, but I don't recall off the top of my head.
Daniel Pryden
@Tal Pressman: for example, Delphi has arrays (both vector and multidimensional). But doesn't have any operator overloading so you just cannot write ArrBig = Arr1+Arr2 for example. Many programmers I know prefer binary strings over arrays but I don't know the complete list of reasons BTW
FractalizeR