tags:

views:

381

answers:

4

Here's an excerpt from Sun's Java tutorials:

A switch works with the byte, short, char, and int primitive data types. It also works with enumerated types (discussed in Classes and Inheritance) and a few special classes that "wrap" certain primitive types: Character, Byte, Short, and Integer (discussed in Simple Data Objects ).

There must be a good reason why the long primitive data type is not allowed. Anyone know what it is?

+8  A: 

Because they didn't implement the necessary instructions in the bytecode and you really don't want to write that many cases, no matter how "production ready" your code is...

[EDIT: Extracted from comments on this answer, with some additions on background]

To be exact, 2³² is a lot of cases and any program with a method long enough to hold more than that is going to be utterly horrendous! In any language. (The longest function I know of in any code in any language is a little over 6k SLOC – yes, it's a big switch – and it's really unmanageable.) If you're really stuck with having a long where you should have only an int or less, then you've got two real alternatives.

  1. Use some variant on the theme of hash functions to compress the long into an int. The simplest one, only for use when you've got the type wrong, is to just cast! More useful would be to do this:

    (int) ((x&0xFFFFFFFF) ^ ((x >>> 32) & 0xFFFFFFFF))
    

    before switching on the result. You'll have to work out how to transform the cases that you're testing against too. But really, that's still horrible since it doesn't address the real problem of lots of cases.

  2. A much better solution if you're working with very large numbers of cases is to change your design to using a Map<Long,Runnable> or something similar so that you're looking up how to dispatch a particular value. This allows you to separate the cases into multiple files, which is much easier to manage when the case-count gets large, though it does get more complex to organize the registration of the host of implementation classes involved (annotations might help by allowing you to build the registration code automatically).

    FWIW, I did this many years ago (we switched to the newly-released J2SE 1.2 part way through the project) when building a custom bytecode engine for simulating massively parallel hardware (no, reusing the JVM would not have been suitable due to the radically different value and execution models involved) and it enormously simplified the code relative to the big switch that the C version of the code was using.

To reiterate the take-home message, wanting to switch on a long is an indication that either you've got the types wrong in your program or that you're building a system with that much variation involved that you should be using classes. Time for a rethink in either case.

Donal Fellows
A switch doesn't actually have to be implemented with a TABLESWITCH though, and it's as much about the *range* of the labels as the number.
Neil Coffey
But is the range being switched over really more than 32 bits wide? I've never heard of code that needed that many switch arms. Without that, you can compact the range (e.g., by using some variation on the theme of hash functions) to make something that will work. Or use a `Map<Long,Runnable>` to solve the problem in a wholly different way. :-)
Donal Fellows
Isn't this answer just kind of passing the buck? The question just becomes "why didn't they implement the necessary instructions?"
Lord Torgamus
@Lord Torgamus: Presumably because its silly. Think about it for a moment: why would anyone, *anyone*, have code with more than 2**32 arms in a `switch`? Wanting to choose over a finite set of that many elements simply points to a mistake in the fundamental design of the program. That people are asking for it indicates merely that **they've got their design wrong**.
Donal Fellows
BTW, if anyone wants to argue with me further on this, please start by giving a use-case for switching on a long. Otherwise we'll be stuck arguing with hypotheticals forever...
Donal Fellows
@Donal, I'm not saying you're wrong; in fact, I'll go ahead and state right now that you are right. My point was that your comment there would have made a better answer than the text you actually submitted as an answer.
Lord Torgamus
@Lord Torgamus: Fair point, and I'll edit it in next. :-)
Donal Fellows
Here's the use case that brought this to my attention. BlackBerry global events are indexed with GUIDs, which are of long type in the BlackBerry platform. I had a handful of events I wanted to create a switch for. I never planned on having that many cases, I just prefer a switch format over "if...else if" statements.
Fostah
@Fostah: I wonder why the BB platform uses longs there. Do they really have that many events that they need GUIDs? (Probably not. More likely some developer at BB was a moron.) Anyway, doing a quick-and-dirty hash or a cast to `int` will probably work. Evil hack, but cheap. You'll have to check that the events going around don't hit your code accidentally, but with GUIDs/UUIDs that's fairly unlikely. (Or use a `Map` populated with anonymous inner class instances.)
Donal Fellows
+5  A: 

Because the lookup table index must be 32 bits.

JRL
But then again a `switch` need not be implemented with a lookup table necessarily.
Joachim Sauer
If that was the case, they could never implement switch for Strings (as they currently plan).
Dimitris Andreou
+7  A: 

I think to some extent it was probably an arbitrary decision based on typical use of switch.

A switch can essentially be implemented in two ways (or in principle, a combination): for a small number of cases, or ones whose values are widely dispersed, a switch essentially becomes the equivalent of a series of ifs on a temporary variable (the value being switched on must only be evaluated once). For a moderate number of cases that are more or less consecutive in value, a switch table is used (the TABLESWITCH instruction in Java), whereby the location to jump to is effectively looked up in a table.

Either of these methods could in principle use a long value rather than an integer. But I think it was probably just a practical decision to balance up the complexity of the instruction set and compiler with actual need: the cases where you really need to switch over a long are rare enough that it's acceptable to have to re-write as a series of IF statements, or work round in some other way (if the long values in question are close together, you can in your Java code switch over the int result of subtracting the lowest value).

Neil Coffey
I have to agree with the "rarity" argument as I've been developing in Java for awhile and I never came across a situation where I needed/tried to switch on a long until now.
Fostah
"Either of these methods could in principle use a long value rather than an integer". So this answer claims that in principle switch on longs are possible (so they can be added in the future too?). Weak. In my answer below, I show why switch on longs would *fundamentally* be a broken concept in 32bit platforms, so no, not even in principle switch on longs could be supported. I also can't help but point out the fact that Neil came and (probably) downvoted my answer, with a reasoning based on a misunderstanding of thread-safety. Nice.
Dimitris Andreou
Dimitris, my reasoning wasn't based on a midunderstanding of thread-safety, just that I don't think the thread-safety argument you put forward holds.
Neil Coffey
I might have misread that. Still though, the "whatever the width" you mention is not too precise. If the width is a word, then you know that at least someone actually wrote the value that was read, so it's not going to be an address out of thin air, but the address someone intended to be. Both are problems, but in longs, the problem is certainly bigger, more ways to go wrong. That's all I'm trying to say.
Dimitris Andreou
A: 

A long, in 32bit architectures, is represented by two words. Now, imagine what could happen if due to insufficient synchronization, the execution of the switch statement observes a long with its high 32 bits from one write, and the 32 low ones from another! It could try to go to ....who knows where! Basically somewhere at random. Even if both writes represented valid cases for the switch statement, their funny combination would probably lead neither to the first nor to the second -- or extremely worse, it could lead to another valid, but unrelated case!

At least with an int (or lesser types), no matter how badly you mess up, the switch statement will at least read a value that someone actually wrote, instead of a value "out of thin air".

Of course, I don't know the actual reason (it's been more than 15 years, I haven't been paying attention that long!), but if you realize how unsafe and unpredictable such a construct could be, you'll agree that this is a definitely very good reason not to ever have a switch on longs (and as long -pun intended- there will be 32bit machines, this reason will remain valid).

Dimitris Andreou
I don't think this follows. The value being switched on needs to be calculated and stored in a register or on the stack. If that value is calculated based on data accessed by multiple threads, this *calculation* needs to be made thread-safe, whatever the width of the result. But then, once that result is in a register or on the stack, it's only accessed by the switching thread and is safe.
Neil Coffey
Neil, your argument is quite confused: "But then, once that result is in a register or on the stack, it's only accessed by the switching thread and is safe". Sure, using that value is thread-safe! But my point is that *that value can _already_ be wrong due to synchronization bugs in user code*. Using thread-safely a wrong value is less than useful :)This issue can never be eliminated: buggy concurrent code could already have produced the "out of thin air"/wrong long value, which can be subsequently used in the switch, making the switch go to a case address _nobody ever specified_.
Dimitris Andreou
Dimitris, maybe there´s something in your argument I'm not understanding. The value switched on could indeed be wrong due to synchronization bugs in user code. But I don't believe there's anything inherent about the switch statement that makes this more likely than in other cases. And thinking it through as best I can, I don't believe that the non-atomicity of hi/low words of long reads/writes to memory is in fact an issue. (Thinking about things antoher way: you could decide that an if comparison on a long was not allowed based on the same argument.)
Neil Coffey
While the potential problems with long being represented as two words with no guaranteed atomic writes is a general issue, agreed, in the case of switch it would be an even more pronounced danger. It's like sending an envelope with a message where half the address is from one person, half is from another - the final address could be valid and correspond to a totally random chap who would then receive the envelope and act accordingly. It's one thing reading garbage and producing garbage (like a wrong boolean), but reading garbage and doing random jumps *does* sound a tad more dangerous to me.
Dimitris Andreou