views:

360

answers:

10

I have heard a lot about "type system", "strongly typed language" and so on. Currently I am working on some .NET COM interop problem, which addressed "marshaling" a lot. And AFAIK, marshaling is quite about conversion between .NET types and COM types.

In many scenarios such as programming language, when talking about types, we are concerned about the logic meaning.

Now I am wondering: what does "type" mean physically? In a way we can watch & touch.

My current understanding is that "type" is nothing but the in-memory representation of an computation entity.

Many thanks to your replies.

Adding-1

Some quotation from MSDN:

Marshaling simple, blittable structures across the managed/unmanaged boundary first requires that managed versions of each native structure be defined. These structures can have any legal name; there is no relationship between the native and managed version of the two structures other than their data layout. Therefore, it is vital that the managed version contains fields that are the same size and in the same order as the native version. (There is no mechanism for ensuring that the managed and native versions of the structure are equivalent, so incompatibilities will not become apparent until run time. It is the programmer's responsibility to ensure that the two structures have the same data layout.)

So as far as Marshaling is concerned, it is the layout matters.

A: 

This depends on the programming paradigm you're working with. In OO types can represent real world objects, in other words, all of the data of a real world object that a computer can represent (or the parts you're interested in anyway).

marr75
+2  A: 

I would say just the opposite. It is the language representation of the bits and bytes in memory.

Didier Trosset
Or possibly, the language representation of a set of (hopefully related) information that can be held in memory. Some languages are free to reorder and pack information along with many other combinations, such as types stored in certain object databases.
280Z28
+3  A: 

In many languages, physically the types only exists at compile time. This is especially true of older languages. I would guess that C has such types that never exist in memory at all, in any way, while the program is running.

In other languages - specifically those which allow run-time type information access (for example C++ with RTTI, or C#, or any dynamic language like Python) - the types are just metadata. A binary description of the type. You know, the kind of stuff you would get if you tried to serialize data into a binary stream.

romkyns
Integer? Where's that?
Steven Sudit
@(Steven Sudit) Of course this is implementation-dependent and I'm over-simplifying, but I was alluding to the virtual method table pointer. I suppose I ended up writing something that's strictly speaking untrue :/ The "inbetween" section is now deleted.
romkyns
C++ without RTTI but with virtual functions provides for a <i>dynamic</i> type isn't it? What I mean by that is, you know a derived class is of type the base class, but knowing that doesn't give you the exact layout of the object isn't it? So, is it accurate to say - type doesn't really exist at runtime?
Gangadhar
+1 'cause true.
Dario
+1  A: 

I would say type can have several meanings.

I tend to prefer its meaning as an Interface constraints. (Well written object code defines all in-memory data as private).

And in such case, type is absolutely NOT related to in-memory representation. On the contrary, it's only a contract on its member methods.

Stephane Rolland
Agree. More generally, it is the set of defined operations upon an entity...
AraK
"Object code" is the type your native compiler spews out before invoking the linker ;-) And apart from that, your definition seems to focus solely on OOP.
delnan
you're right :-) I should have said " Well written Object Oriented Programing code..." ;-) and yes I tend to focus only on interfaces.
Stephane Rolland
A: 

IIRC strongly type languages enforce the object types at compile time e.g. a number must be an int, float etc type. In weakly typed languages you can say giraffe = 1 + frog * $100 / 'May 1' and the types are resolved at run time. And you usually get lots of runtime errors.

In data interchange situations (like COM, CORBA, RPC etc) it is very hard to enforce types because of binary compatability (big endian, little endian) and formats (how do you represent strings and dates when passing from one language to another, each with different compilers?). Hence the marshaling to try and resolve the types of each parameter. ASN.1 was one of many attempts to build a 'universal types' framework when interchanging data between machines.

james
-1 (delayed): strongly typed languages only enforce type checks; it doesn't have to be at compile time.
romkyns
A: 

A type is a human-readable logical blueprint for how data should be represented and organized in memory. It is a way of allowing humans to segregate how a concept can be rationalized into a digital sequence in a standard manner. The machine and the compiler really don't care about the difference between a string, integer, fooClass. These "types" are simply agreed upon organizational units that all human programmers to translate logical concepts into a rational data structures within the memory.

Joel Etherton
A: 

Type is a bundle word. When you know something's type, you know how much memory it takes up, how the pieces of it are stored, but more importantly you also know what you can do with it. For example there are several integer types that take up the same amount of memory as a pointer. However you can multiply one integer type by another (eg 3 times 4) but you cannot multiply two pointers together. You can call the Foo() method on some user-defined-type (struct or class) that has a Foo method, writing x.Foo() for example, but you can't do that for a different user-defined-type that doesn't have a Foo method. You can cast between some pairs of types, but not between others, or you can cast an A to a B but not a B to an A. And so on. In some languages there are also distinctions like whether it is const or not.

Compilers and runtimes carry around a large amount of information all of which adds up to the item's type. The physicality of how many bytes it takes up (or anything else you could plausibly claim to be tangible) is really not the point.

Kate Gregory
+2  A: 

A type is metadata about bits and bytes that defines how to manipulate them in a meaningful and safe fashion.

DeadMG
+15  A: 

I think there are three aspects to “types” in programming (and they probably overlap, so don’t take this as a hard-and-fast separation):

  • A type is an element of a set of types, and every program/assembly/unit defines such a set. This is the most theoretical idea I can think of and is probably most useful to logicians and mathematicians. It is very general, and it allows you to define the idea of a type system on top of it. For example, a programming environment might define a relation on those types, e.g. the is-assignable-to relation.

  • A type is a semantic category. This is a linguistic or cognitive idea; in other words, it is most useful to humans who are thinking about how to program the computer. The type encapsulates what we think of as “things that belong in a category”. A type might be defined by a common purpose of entities. This categorisation according to purpose is, of course, arbitrary, but that’s okay, since the declaration of types in programming is arbitrary too.

  • A type is a specification of how data is layed out in memory. This is the most low-level idea I can think of. Under this point of view, a type says nothing about the purpose or semantics of the data, but only how the computer is going to construct it, process it, etc. In this idea a type is somewhat more like a data encoding or a communications protocol.

Which meaning of type you go by depends on your domain. As already hinted, if you’re a logician doing research on how to prove properties of a program, the first definition is going to be more useful than the third because the data layout is (usually) irrelevant to the proof. If you’re a hardware designer or the programmer of a low-level system such as the CLR or the JavaVM, then you need the third idea and you don’t really care about the first. But to the common programmer who just wants to get on with their task, it is probably the middle one that applies.

Timwi
Yes, the 1st and 2nd one are what I mean by "logic". The 3rd is just what I am thinking of. I appreciate your answer much. But allow me to keep this question open for a while. Thanks.
smwikipedia
+10 if I could. Especially for the last paragraph.
delnan
"A type is a specification of how data is layed out in memory." - that's just wrong. The same type can have different memory-layouts depending of the compiler in use and the hardware platform. See an "int" is an "int", but on some platforms it is an 16bit Big-Endian machine word, on others it is an 32bit Little-Endian word.Furthermore there are alignment / padding considerations so that a struct or class definition may have a different memory layout if another compiler version is used.
IanH
@IanH: If it’s 16-bit BE here and 32-bit LE there, what makes you say it’s the “same type”?
Timwi
@lanH, thanks for your pointing out. I think Timwi's word has taken into consideration of the difference of compiler implementation and platform details.
smwikipedia
+1 for the third answer. It's the closest to *touch* representation of type. Although I would add that beside the *data being laid out in memory* it can also be small functional programs (ie. methods) that have access to this data in memory and they manipulate it in whatever way needed.
Robert Koritnik
@IanH: type is still how data is laid out in memory, taking into consideration platform and compiler. Of course. But on the same platform and same compiler being user, this memory map will be the same.
Robert Koritnik
A: 

A "type" is a set whose members ("objects") have a discrete finite representation and a useful set of shared attributes.

The actual in-memory representation of an object is not necessarily part of the definition of a type. That is to say that a single object may have multiple in-memory representations. The important thing is that an object may not be infinite or analog.

The shared attributes of a type can be anything. In object-oriented system, the attributes would include (at a low level) data and behavior. Event notifications are also common. Some attributes may be conditional without violating the type definition (if boolean attribute X is true, then attribute Y also exists), so long as the rules are consistent across all objects in the type.

A "subtype" is a subset of a type whose members have a wider set of shared attributes.

This way of thinking about types is very different from what you pose in the question, and I believe this distinction is important.

If one sees types as an in-memory representation, then that representation will be viewed as the salient feature of the type, and it will be taken for granted. Interop will be achieved through low-level conversions and reinterpretations of existing byte sequences. This could lead to problems in some instances when that representation changes.

If, however, one sees types in terms of their attributes, then conversions from one type system to another will involve high-level conversions of data fields between corresponding objects. A determination of whether objects are compatible will be based on their salient attributes, and problems become less likely.

Even in the world of interop, knowledge of the internal details of types should not be relied upon. That is to say, features of an implementation of a type that are not part of the definition of that type should not be used as though they were a part of that type.

Jeffrey L Whitledge
Hi, thanks for your answer. It would take some time to digest. BTW, what's the meaning of "salent"? I don't recognize this word.
smwikipedia
@smwikipedia - I meant "salient". I forgot to spellcheck before I posted. It's fixed now! :-)
Jeffrey L Whitledge