ansaurus

Question

How are strings embedded in binary files?

Answer 1

+1 A:

Apparently you're defining your very own byte code. this has nothing to do with the syntax/grammar of .NET CIL, right ?

If so, and if you concern is how to encode strings (as opposed to other instructions such as jumps, loops, etc.), you can just invent your own "instruction" for it.

For example, hex code "01xx" could be for a string containing xx bytes (0 -255). Your language interpreter would then be taught to store this string on the stack (or whereever) and move to decode the following byte code located xx bytes further down the bytecode stream.

If you concern is how to mix character data and numeric data in whatever storage you have for the bytecode, please provide specifics and maybe someone can help...

mjv 2009-09-19 05:31:44

Correct, i'm making my own. I kind of get what you're saying but ieach instruction in my bytecode consists of 4 separate bytes (1 for the opcode and 3 other ones, who's purpose varies with the instruction), and i'd like to avoid having variable length instructions. It could be safely achieved with encoding the length of the data in the instruction itself but it would make it much more complex...

RCIX 2009-09-19 05:39:18

I see the advantages of having the bytecode with a fixed length and format. In that case the strings may just be implemented as a instruction for variable declaration (which you may readily have designed) whereby the index (be it address, offset, subscript...) where the actual string is stored. The difference with a regular variable is that the storage where the string resides is initialized with the string value. Indeed with 3 byte instructions you may find yourself limited for other types than just the strings (say how do you encode a numeric value bigger than 8 millions?

mjv 2009-09-19 05:51:35

That's another thing i'm a bit puzzled about as well... But i may just go ahead and do that. Thanks!

RCIX 2009-09-21 05:47:15

Answer 2

A:

If you can store numbers in an array, then you can store ASCII data in the same array. Ignoring the idea of a string as a class, a simple string is just a character array anyway -- and in C, a byte with a value of 0 indicates the end of the string.

As a simple proof-of-concept in C:

int main()
{
    putchar(104); // h
    putchar(101); // e
    putchar(108); // l
    putchar(108); // l
    putchar(111); // o
    putchar(10);  // \n
    return 0;
}

Output:

$ ./a.out
hello

Maybe a reference on character arrays as strings would help?

Mark Rushakoff 2009-09-19 05:37:28

It's not quite that simple. I'm trying to embed strings with other bytes (which happen to be instructions in my own custom format) and i'm not sure how to do that.

RCIX 2009-09-19 05:52:58

ansaurus

tags:

views:

answers:

How are strings embedded in binary files?

related questions