views:

59

answers:

4

So I'm wondering if there is an answer in pure .NET for representing a collection of arbitrary data types. I know there's the old, late-bound, VB6 Collections, but I was looking for something like Generics, but either without having to specify the type at compile time, OR finding a clever way to allow the code to determine the type on its own and then call some generic class.

Why? I'm bored, and I thought it'd be fun to try and implement my own library for NBT, or NamedBinaryTag. It's the storage format used in the popular Minecraft game. Specification document is here: http://www.minecraft.net/docs/NBT.txt

I know there are existing implementations out there, but there's no point in copying those if I'm doing this solely as a learning experience to get a better grasp on file streams, byte arrays, endian conversion, and general .NET stuff (I used to fiddle with VB6/VBA a lot, so .NET is a huge change).

What's hanging me up is TAG_Compound. Per that specification, it's essentially a collection of objects of any other Tag type, including additional, nested TAG_Compounds. You can do some freaky nesting/recursion with this kind of a format.

I've got a rough outline in my head of how to do the other classes, but the storage of arbitrary types is just making me draw a blank on how to store that in a stub class (clsTagCompound) So that a generic class (clsNBT(Of T)) can use generic functions to access the payload.

List(Of T) looks like it could work if I could feed it a common interface. But since a Generic class will be the main component used, its interface is also generic, and that just leads to nasty generics chain (List(Of (clsNBT(Of XXX))).

Thoughts, tips, criticisms about my thinking?

Since this spec works with byte streams, here's hex output of what an uncompressed NBT file looks like (created using one of the Minecraft editors). It's a TAG_String wrapped in a TAG_Compound, which while not specifically stated, is usually the first TAG found in an NBT file and it encapsulates all other tags.

0A 00 04 72 6F 6F 74 08 00 06 66 6F 6F 62 61 72 00 07 50 49 52 41 54 45 21 00

From left-to-right:
Byte 1: TagType - specifies TAG_Compound.
Bytes 2-3: Length of string for the name of TAG_Compound.
Bytes 4-7: "root", name of TAG_Compound.
Byte 8: TagType - specifies TAG_String (embedded in TAG_Compound).
Bytes 9-10: Length of string for the name of TAG_String.
Bytes 11-16: "foobar", name of TAG_String.
Bytes 17-18: Length of Payload (TAG_String, so string length).
Bytes 19-25: "PIRATES!", payload of TAG_String.
Byte 26: TagType - specifies TAG_End, marks the end of a TAG_Compound or TAG_List.

Same basic principle applies to the other tag types. Very simple design, yet seems really powerful. Probably one of the reasons why a game at alpha-level code runs quite well, especially in Java.

EDIT: Here's a link to the level specification. It gives a more understandable way of seeing how these tags work together:
hxxp://www.minecraftwiki.net/wiki/Alpha_Level_Format#level.dat_Format

NOTE: Swap "hxxp" with "http" above. I lack enough reputation here to post multiple links (pft).

NOTE: I'm not too interested in doing any mods for the game -- I just find the NBT format neat and simple enough to be potentially useful. Already pondering on how to extend the format to handle Unsigned types in the tags (i.e., TAG_UInteger), and maybe prefixing the uncompressed stream with a magic number (like Linux/Unix executables have "ELF" in the first four bytes). That would prevent any issues from some of these tools being used to open arbitrary/unexpected data formats (and I will probably pass such ideas back to the game's developer, too).

EDIT2: So I changed things up. clsNamedBinaryTag is now an abstract class that implements a generic method defined in a generic interface:

Friend Interface INbt(Of T)
    ...
    Function GetPayload() As T
    Function SetPayload(ByRef data As BinaryReader) As Boolean
End Interface


Friend MustInherit Class clsNamedBinaryTag(Of T)
    Implements INbt(Of T)

    ...

    Protected Friend MustOverride _
    Function GetPayload() As T _
        Implements INbt(Of T).GetPayload

    Protected Friend MustOverride _
    Function SetPayload(ByRef data As BinaryReader) As Boolean _
        Implements INbt(Of T).SetPayload
End Class

GetPayload is the generic method, since it will fetch and return payloads of arbitrary types. Great for the simple things like Strings and such. Not so great when we run into TAG_Compound.

What I'm thinking of doing, is making all derived classes implement INbt(Of T). For clsTagCompound, its SetPayload method will start walking a bytestream after the compound's name field is parsed. For each new TagType that it encounters, it would theoretically call DirectCast on a temp variable Dim'ed to INbt(Of T) to convert it to the class defining that particular TagType.

But this doesn't seem to work as planned. I believe my Catch 22 is that to even use clsTagCompound, I still have to define T, and that's where I get stuck again. I somehow need to create an interface that is NOT generic, yet can be applied to all the classes for the various Tag types and still call their GetPayload function to return the payload specific to a particular tag.

A: 

you can use the object or varientType types for a list that stores an arbitrary set of objects.

To find out an objects type at runtime you can use reflection. I've never used it in vb.net, but it is supported.

Statler
Yeah, I want to avoid the use of oldstyle VB6 objects/collections. That incurs the late-binding penalty. Reflection is something I've done some reading on, and while it is faster than old-school objects, it still takes penalty hits than using Generics or finding a sane why to strongly-type the code.
Kumba
A: 

I don't know the details of this structures you are talking about (your link to NBT.txt is not valid...), but as I understand you might want to have a clsTagCompound class that is your base class. If you have different variants of TAG_Compound with different meaning, you can declare those as classes that inherits from clsTagCompound

Then you can declare clsNBT as a collection class of clsTagCompound's (or whatever - I didn't quite get your purpose with clsNBT) .

To handle sub levels, it's best to have a children property on clsTagCompound

Class clsTagCompound

 Public Name As String
 Private mChildren As New List(Of clsTagCompound)
 Public ReadOnly Property Children As List(Of clsTagCompound)
  Get    
    Return mChildren 
  End Get 
 End Property
End Class

Class clsTag1
 Inherits clsTagCompound

End Class

Class clsTag2
 Inherits clsTagCompound

End Class

Objects in a List(Of clsTagCompound) can be a mix of objects from any class that inherits clsTagCompound.

awe
The Minecraft website has apparently been under a SYN Flood attack the past few days. I guess someone can't accept that it's just a game... Try reloading the page every few hours.
Kumba
+2  A: 

If you want to use a collection type that can hold anything, you can use the non-generic collections (in the System.Collections namespace), or a generic collection (Of Object).

If you want to determine the type of an object at runtime, you can use the GetType method (to match on exact type) or the Typeof [something] Is [sometype] construction to match that a implements/inherits/is of a given interface/class. More explanations on MSDN.

For your problem, I would write a generic interface ITagElement, that would be implemented by classes such as TagCompound, TagString, TagList, etc. As for the TagList class, I would make it inherit one of the classes in the System.Collections.ObjectModel namespace.

jhominal
GetType I've played with before. Type Of...Is is a fairly new one to me. Seems aimed, per the MSDN link, at figuring out what the type of data contained within a generic Object is. But isn't that expensive? If I understand the terminology correctly (and after discovering/playing around with ILDASM), this involves boxing and unboxing the contained datatype, which takes a few cycles.
Kumba
I am working on a per-tag Class, btw. So right now, I have clsTagString and clsTagCompound. The NBT spec gives a URL to a small testcase NBT file that is comprised of one TAG_Compound "root" and one TAG_String "hello world" with a value of "Bananrama". My goal is to be able to parse that first, and then start working from there.clsTagString is complete, and theoretically should work. It's clsTagCompound that's proving to be a doozy because it holds as its payload ANY other tag, including additional TAG_Compounds. Generics might be the answer, but those things are hard to think about.
Kumba
Further (stupid comment limits), I have a base class, clsNamedBinaryTag, that contains common accessor properties/functions for the TagType, TagName, and TagNameLength fields common to all tags. Tags can also NOT have a name -- determinable by seeing if the TagNameLength is 0.The problem with that is Payload. Each tag implements a different payload, so TAG_String stores a string prefixed with its string length. TAG_Integer stores a 32bit signed integer value. TAG_List stores (essentially) an array of a single type.
Kumba
...Continuing....I am able to map many of these back to primitive .NET data types (TAG_Integer --> Int32; TAG_List --> Array, TAG_Float --> Decimal (or Single), etc..). So that's what the Payload property of clsNamedBinaryTag will return, I'm thinking, via Generics.The catch is, what primitive (or non-) data type in .NET would handle a list of objects of any data type, including itself, tey can be represented by a Generic type (Of T) to be returnable by a Payload property?I could also revert to a top-down structure, making clsNamedBinaryTag an abstract class and inherit on down...
Kumba
@Kumba: Never really paid attention to MSIL myself so I can't talk about it, but unless I am missing something critical, when you are doing OO programmation, the program will always need, at some time, to check the data type represented by an object - that will always have a cost (and if the runtime/compiler is worth its weight, that operation will be well-optimized). Also read that other question: http://stackoverflow.com/questions/211414/is-premature-optimization-really-the-root-of-all-evil
jhominal
@jhominal: I believe my understanding stems from when .net "wraps" a datatype (say a custom type implemented by a Class) into a generic Object -- this is "boxing", and "unboxing" is the opposite. The goal for efficient .NET programming is to avoid those as much as possible.
Kumba
Gah, stupid comment box disallows newlines. Anyways, I do like to optimize as I go, but only so I minimize having to backtrack to already written code to apply newly-discovered optimizations. And I'm not talking about complicated types, either. I am fairly new to .NET myself, so I might employ, say, an older VB6/VBA method to achieving some action, only to then later discover that .NET has a newer, better, faster way of doing it.
Kumba
Boxing and unboxing is something that happens only when value types are identified as a subclass of `Object`. But I don't think that it means anything for reference types.
jhominal
A: 

When dealing with packets of fixed format I use something like this. This also illustrates an easy way to fix "endianess". I decoded the first few bytes(using the sample you provided) so you could get an idea.

Enum NBT 'define the offsets into the packet, change names / add others as needed
    byte1 = 0
    byte23 = 1
    byte47 = 3
    byte8 = 7
    'etc
End Enum

Dim swEndian As Boolean = True

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    Dim testData() As Byte = New Byte() {&HA, &H0, &H4, &H72, &H6F, &H6F, &H74, &H8, &H0, &H6, &H66, &H6F, &H6F, &H62, &H61, &H72, &H0, &H7, &H50, &H49, &H52, &H41, &H54, &H45, &H21, &H0}

    Dim byte1 As Byte = testData(NBT.byte1)

    Dim byte2 As Int16 = BitConverter.ToInt16(testData, NBT.byte23)
    If swEndian Then byte2 = System.Net.IPAddress.NetworkToHostOrder(byte2)

    Dim byte4 As Int32 = BitConverter.ToInt32(testData, NBT.byte47)
    If swEndian Then byte4 = System.Net.IPAddress.NetworkToHostOrder(byte4)

End Sub
dbasnett
I actually found that if the bytes are loaded into a byte array, calling Array.Reverse() will flip them into proper BE format. And that's more easily documentable. I can change it later if MS ever decides to fixup BinaryReader/BinaryWriter to allow specifying the endianess.
Kumba
What if there are strings and numbers in the packet, as in your example? Reversing the array also reverses the strings.
dbasnett