views:

257

answers:

4

I was initially surprised that Java decides to specify that byte is signed, with a range from -128..127 (inclusive). I'm under the impression that most 8-bit number representations are unsigned, with a range of 0..255 instead (e.g. IPv4 in dot-decimal notation).

So has James Gosling ever been asked to explain why he decided that byte is signed? Has there been notable discussions/debates about this issue in the past between authoritative programming language designers and/or critics?

+9  A: 

It appears that simplicity was the main reason. From this interview:

Gosling: For me as a language designer, which I don't really count myself as these days, what "simple" really ended up meaning was could I expect J. Random Developer to hold the spec in his head. That definition says that, for instance, Java isn't -- and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. Things like that made C complex. The language part of Java is, I think, pretty simple. The libraries you have to look up.

My initial assumption was that it's because Java doesn't have unsigned numeric types at all. Why should byte be an exception? char is a special case because it has to represent UTF-16 characters (thanks to Jon Skeet for the quote)

Bozho
Except for `char`.
Greg Hewgill
char isn't even numeric, is it? so char is neither signed, nor unsigned.
unbeli
+1 for the quote. Not sure if I agree, but that's what I asked for =)
polygenelubricants
@unbeli: It's a numeric type in the language specification.
Jon Skeet
@unbell char is an integral type, just like the byte,short, int and long http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.1
Pete Kirkham
From section 4.2: The numeric types are the integral types and the floating-point types. The integral types are byte, short, int, and long, whose values are 8-bit, 16-bit, 32-bit and 64-bit signed two's-complement integers, respectively, and char, whose values are 16-bit unsigned integers representing UTF-16 code units (§3.1).
Jon Skeet
I understand why they might do that for ints but aren't unsigned bytes o to 255 simpler?
Roman A. Taycher
yep, it is numeric indeed
unbeli
@Roman: I've observed many questions on stackoverflow regarding `byte` level manipulation (something Bloch recommends AGAINST perhaps precisely because...); they're really tricky to get right because of sign extension. Fortunately most of these can be hidden away in libraries, but it would be nice if the language elements themselves aren't so tricky to begin with.
polygenelubricants
+1  A: 

I'm not aware of any direct quotes from James Gosling, but there's an official RFE for unsigned byte:

Bug ID: 4186775: request unsigned integer types, esp. unsigned byte

State: 11-Closed, Will Not Fix, request for enhancement

Please extend the Java design to allow unsigned types, particularly unsigned byte.

I have been wondering why there are no unsigned integer types in Java. It seems to me that for byte-length values it is extremely awkward not to have them [...]

I recognize that this was a design decision made by the Java developers. What I don't understand is why. Did they consider unsigned integer types evil or harmful, and chose to protect me from myself?

polygenelubricants
Whoa there's a lot of heated discussion in that RFE. Recommended read.
polygenelubricants
Really surprised that the RFE has 0 votes since 1998. Perhaps this is a non-issue for most people after all.
polygenelubricants
It think when the close bugs the vote count goes to zero.
Rulmeq
+4  A: 

As per 'Oak Language Specification 0.2' aka Java language:

"The Oak byte type is what C programmers are used to thinking of as the char type. But in the Oak language, characters are 16 bits wide. Having a separate byte type removes the confusion in C between the interpretation of char as an 8 bit integer and as a character."

You can grab a postscript copy from here :

https://duke.dev.java.net/green/OakSpec0.2.ps

Also there is a part of interview posted on this site: (Where he is defending the absence of unsigned byte in java)

http://www.darksleep.com/player/JavaAndUnsignedTypes.html

Adding the interview taken from the above mentioned page...

*" http://www.gotw.ca/publications/c_family_interview.htm

Q: Programmers often talk about the advantages and disadvantages of programming in a "simple language." What does that phrase mean to you, and is [C/C++/Java] a simple language in your view?

Ritchie: [deleted for brevity]

Stroustrup: [deleted for brevity]

Gosling: For me as a language designer, which I don't really count myself as these days, what "simple" really ended up meaning was could I expect J. Random Developer to hold the spec in his head. That definition says that, for instance, Java isn't -- and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. Things like that made C complex. The language part of Java is, I think, pretty simple. The libraries you have to look up.

On the other hand.... According to http://www.artima.com/weblogs/viewpost.jsp?thread=7555

Once Upon an Oak ... by Heinz Kabutz July 15, 2003

... Trying to fill my gaps of Java's history, I started digging around on Sun's website, and eventually stumbled across the Oak Language Specification for Oak version 0.2. Oak was the original name of what is now commonly known as Java, and this manual is the oldest manual available for Oak (i.e. Java). ... Unsigned integer values (Section 3.1)

The specification says: "The four integer types of widths of 8, 16, 32 and 64 bits, and are signed unless prefixed by the unsigned modifier.

In the sidebar it says: "unsigned isn't implemented yet; it might never be." How right you were. "*

Favonius
Can you quote the interview here? I'm having trouble loading the page.
polygenelubricants
Couldn't add it in comments so updated the answer ...
Favonius
+2  A: 

There's no reason for a byte to be unsigned. when you have char type to represent characters, the byte would normally not do that job of a char.

this. __curious_geek
I believe Java Chars are USC-2 and stored in 16 bits/2 bytes. Even if that wasn't the case I have always felt it to be an ugly type wart that c has no native byte type(yes I know a char is a byte but even for c it feels like playing too loose with types).
Roman A. Taycher