views:

51

answers:

2

I'm writing my own bytecode and virtual machine (on .NET) and one thing i can't figure out is how to embed strings into my bytecode. Any ideas now how i should do it?

+1  A: 

Apparently you're defining your very own byte code. this has nothing to do with the syntax/grammar of .NET CIL, right ?

If so, and if you concern is how to encode strings (as opposed to other instructions such as jumps, loops, etc.), you can just invent your own "instruction" for it.

For example, hex code "01xx" could be for a string containing xx bytes (0 -255). Your language interpreter would then be taught to store this string on the stack (or whereever) and move to decode the following byte code located xx bytes further down the bytecode stream.

If you concern is how to mix character data and numeric data in whatever storage you have for the bytecode, please provide specifics and maybe someone can help...

mjv
Correct, i'm making my own. I kind of get what you're saying but ieach instruction in my bytecode consists of 4 separate bytes (1 for the opcode and 3 other ones, who's purpose varies with the instruction), and i'd like to avoid having variable length instructions. It could be safely achieved with encoding the length of the data in the instruction itself but it would make it much more complex...
RCIX
I see the advantages of having the bytecode with a fixed length and format. In that case the strings may just be implemented as a instruction for variable declaration (which you may readily have designed) whereby the index (be it address, offset, subscript...) where the actual string is stored. The difference with a regular variable is that the storage where the string resides is initialized with the string value. Indeed with 3 byte instructions you may find yourself limited for other types than just the strings (say how do you encode a numeric value bigger than 8 millions?
mjv
That's another thing i'm a bit puzzled about as well... But i may just go ahead and do that. Thanks!
RCIX
A: 

If you can store numbers in an array, then you can store ASCII data in the same array. Ignoring the idea of a string as a class, a simple string is just a character array anyway -- and in C, a byte with a value of 0 indicates the end of the string.

As a simple proof-of-concept in C:

int main()
{
    putchar(104); // h
    putchar(101); // e
    putchar(108); // l
    putchar(108); // l
    putchar(111); // o
    putchar(10);  // \n
    return 0;
}

Output:

$ ./a.out
hello

Maybe a reference on character arrays as strings would help?

Mark Rushakoff
It's not quite that simple. I'm trying to embed strings with other bytes (which happen to be instructions in my own custom format) and i'm not sure how to do that.
RCIX