views:

144

answers:

9

Hi,

A few days ago, I asked why its not possible to store binary data, such as a jpg file into a string variable.

Most of the answers I got said that string is used for textual information such as what I'm writing now.

What is considered textual data though? Bytes of a certain nature represent a jpg file and those bytes could be represented by character byte values...I think. So when we say strings are for textual information, is there some sort of range or list of characters that aren't stored?

Sorry if the question sounds silly. Just trying to 'get it'

A: 

Depends on the language. For example in Python string types (str) are really byte arrays, so they can indeed be used for binary data.

In C the NULL byte is used for string termination, so a sting cannot be used for arbitrary binary data, since binary data could contain null bytes.

In C# a string is an array of chars, and since a char is basically an alias for 16bit int, you can probably get away with storing arbitrary binary data in a string. You might get errors when you try to display the string (because some values might not actually correspond to a legal unicode character), and some operations like case conversions will probably fail in strange ways.

In short it might be possible in some langauges to store arbitrary binary data in strings, but they are not designed for this use, and you may run into all kinds of unforseen trouble. Most languages have a byte-array type for storing arbitrary binary data.

JacquesB
im referring to c#
Sir Psycho
A: 

Deep down everything is just bytes. Things like strings and pictures are defined by rules about how to order bytes. strings for example end in a byte with value 32 (or something else) jpg's don't

Jacobbus
How a string is terminated depends on the compiler used. I think in Delphi, the length of a string is indicated by the first two bytes, which is why you need a different variable for strings longer than 0xFFFF characters.
Treb
+1  A: 

I think you are referring to binary to text encoding issue. (translate a jpg into a string would require that sort of pre-processing)

Indeed, in that article, some characters are mentioned as not always supported, other can be confusing:

Some systems have a more limited character set they can handle; not only are they not 8-bit clean, some can't even handle every printable ASCII character.
Others have limits on the number of characters that may appear between line breaks.
Still others add headers or trailers to the text.

And a few poorly-regarded but still-used protocols use in-band signaling, causing confusion if specific patterns appear in the message. The best-known is the string "From " (including trailing space) at the beginning of a line used to separate mail messages in the mbox file format.

VonC
+2  A: 

I see three major problems with storing binary data in strings:

  1. Most systems assume a certain encoding within string variables - e.g. if it's a UTF-8, UTF-16 or ASCII string. New line characters may also be translated depending on your system.
  2. You should watch out for restrictions on the size of strings.
  3. If you use C style strings, every null character in your data will terminate the string and any string operations performed will only work on the bytes up to the first null.
  4. Perhaps the most important: it's confusing - other developers don't expect to find random binary data in string variables. And a lot of code which works on strings might also get really confused when encountering binary data :)
VolkA
+1  A: 

Whoever told you you can't put 'binary' data into a string was wrong. A string simply represents an array of bytes that you most likely plan on using for textual data... but there is nothing stopping you from putting any data in there you want.

I do have to be careful though, because I don't know what language you are using... and in some languages \0 ends the string.

In C#, you can put any data into a string... example:

byte[] myJpegByteArray = GetBytesFromSomeImage();

string myString = Encoding.ASCII.GetString(myJpegByteArray);
Timothy Khouri
Maybe he can. But he shouldn't. Correction: He definitely can. And he definitely shouldn't.
Treb
I would tend to agree... I'm not questioning 'why' he wants to do this :) Maybe there is a function that someone else wrote that writes to an online file storage system... and it only takes strings!!! (I'm making that up) :)
Timothy Khouri
A: 

I agree with Jacobus' answer: In the end all data structures are made up of bytes. (Well, if you go even deeper: of bits). With some abstraction, you could say that a string or a byte array are conventions for programmers, on how to access them.

In this regard, the string is an abstraction for data interpreted as a text. Text was invented for communication among humans, computers or programs do not communicate very well using text. SQL is textual, but is an interface for humans to tell a database what to do.

So in general, textual data, and therefore strings, are primarily for human to human, or human to machine interaction (say for the content of a message box). Using them for something else (e.g. reading or writing binary image data) is possible, but carries lots of risk bacause you are using the data type for something it was not designed to handle. This makes it much more error prone. You may be able to store binary data in strings, mbut just because you are able to shoot yourself in the foot, you should avoid doing so.

Summary: You can do it. But you better don't.

Treb
A: 

Your original question (c# - What is string really good for?) made very little sense. So the answers didn't make sense, either.

Your original question said "For some reason though, when I write this string out to a file, it doesn't open." Which doesn't really mean much.

Your original question was incomplete, and the answers were misleading and confusing. You CAN store anything in a String. Period. The "strings are for text" answers were there because you didn't provide enough information in your question to determine what's going wrong with your particular bit of C# code.

You didn't provide a code snippet or an error message. That's why it's hard to 'get it' -- you're not providing enough details for us to know what you don't get.

S.Lott
+2  A: 

I would prefer to store binary data as binary, you would only think of converting it to text when there's no other choice since when you convert it to a textual representation it does waste some bytes (not much, but it still counts), that's how they put attachments in email.

Base64 is a good textual representation of binary files.

Osama ALASSIRY
+1  A: 

Before internationalization, it didn't make much difference. ASCII characters are all bytes, so strings, character arrays and byte arrays ended up having the same implementation.

These days, though, strings are a lot more complicated, in order to deal with thousands of foreign language characters and the linguistic rules that go with them.

Sure, if you look deep enough, everything is just bits and bytes, but there's a world of difference in how the computer interprets them. The rules for "text" make things look right when it's displayed to a human, but the computer is free to monkey with the internal representation. For example,

Harry Tsai