views:

1537

answers:

11

I have a piece of code with a) which I replaced with b) purely for legibility ...

a)

if ( WORD[ INDEX ] == 'A' ) branch = BRANCH.A;
/* B through to Y */
if ( WORD[ INDEX ] == 'Z' ) branch = BRANCH.Z;

b)

switch ( WORD[ INDEX ] ) {
    case 'A' : branch = BRANCH.A; break;
    /* B through to Y */
    case 'Z' : branch = BRANCH.Z; break;
}


... will the switch version cascade through all the permutations or jump to a case ?



EDIT:

Some of the answers below regard alternative approaches to the approach above.
I have included the following to provide context for its use.

The reason I asked, the Question above, was because the speed of adding words empirically improved.

This isn't production code by any means, and was hacked together quickly as a PoC.

The following seems to be a confirmation of failure for a thought experiment.
I may need a much bigger corpus of words than the one I am currently using though.
The failure arises from the fact I did not account for the null references still requiring memory. ( doh ! )

public class Dictionary {
    private static Dictionary ROOT;
    private boolean terminus;
    private Dictionary A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z;
    private static Dictionary instantiate( final Dictionary DICTIONARY ) {
     return ( DICTIONARY == null ) ? new Dictionary() : DICTIONARY;
    }
    private Dictionary() {
     this.terminus = false;
     this.A = this.B = this.C = this.D = this.E = this.F = this.G = this.H = this.I = this.J = this.K = this.L = this.M = this.N = this.O = this.P = this.Q = this.R = this.S = this.T = this.U = this.V = this.W = this.X = this.Y = this.Z = null;
    }
    public static void add( final String...STRINGS ) {
     Dictionary.ROOT = Dictionary.instantiate( Dictionary.ROOT );
     for ( final String STRING : STRINGS ) Dictionary.add( STRING.toUpperCase().toCharArray(), Dictionary.ROOT , 0, STRING.length() - 1 );
    }
    private static void add( final char[] WORD, final Dictionary BRANCH, final int INDEX, final int INDEX_LIMIT ) {
     Dictionary branch = null;
     switch ( WORD[ INDEX ] ) {
     case 'A' : branch = BRANCH.A = Dictionary.instantiate( BRANCH.A ); break;
     case 'B' : branch = BRANCH.B = Dictionary.instantiate( BRANCH.B ); break;
     case 'C' : branch = BRANCH.C = Dictionary.instantiate( BRANCH.C ); break;
     case 'D' : branch = BRANCH.D = Dictionary.instantiate( BRANCH.D ); break;
     case 'E' : branch = BRANCH.E = Dictionary.instantiate( BRANCH.E ); break;
     case 'F' : branch = BRANCH.F = Dictionary.instantiate( BRANCH.F ); break;
     case 'G' : branch = BRANCH.G = Dictionary.instantiate( BRANCH.G ); break;
     case 'H' : branch = BRANCH.H = Dictionary.instantiate( BRANCH.H ); break;
     case 'I' : branch = BRANCH.I = Dictionary.instantiate( BRANCH.I ); break;
     case 'J' : branch = BRANCH.J = Dictionary.instantiate( BRANCH.J ); break;
     case 'K' : branch = BRANCH.K = Dictionary.instantiate( BRANCH.K ); break;
     case 'L' : branch = BRANCH.L = Dictionary.instantiate( BRANCH.L ); break;
     case 'M' : branch = BRANCH.M = Dictionary.instantiate( BRANCH.M ); break;
     case 'N' : branch = BRANCH.N = Dictionary.instantiate( BRANCH.N ); break;
     case 'O' : branch = BRANCH.O = Dictionary.instantiate( BRANCH.O ); break;
     case 'P' : branch = BRANCH.P = Dictionary.instantiate( BRANCH.P ); break;
     case 'Q' : branch = BRANCH.Q = Dictionary.instantiate( BRANCH.Q ); break;
     case 'R' : branch = BRANCH.R = Dictionary.instantiate( BRANCH.R ); break;
     case 'S' : branch = BRANCH.S = Dictionary.instantiate( BRANCH.S ); break;
     case 'T' : branch = BRANCH.T = Dictionary.instantiate( BRANCH.T ); break;
     case 'U' : branch = BRANCH.U = Dictionary.instantiate( BRANCH.U ); break;
     case 'V' : branch = BRANCH.V = Dictionary.instantiate( BRANCH.V ); break;
     case 'W' : branch = BRANCH.W = Dictionary.instantiate( BRANCH.W ); break;
     case 'X' : branch = BRANCH.X = Dictionary.instantiate( BRANCH.X ); break;
     case 'Y' : branch = BRANCH.Y = Dictionary.instantiate( BRANCH.Y ); break;
     case 'Z' : branch = BRANCH.Z = Dictionary.instantiate( BRANCH.Z ); break;
     } 
     if ( INDEX == INDEX_LIMIT ) branch.terminus = true;
     else Dictionary.add( WORD, branch, INDEX + 1, INDEX_LIMIT );
    }
    public static boolean is( final String STRING ) {
     Dictionary.ROOT = Dictionary.instantiate( Dictionary.ROOT );
     return Dictionary.is( STRING.toUpperCase().toCharArray(), Dictionary.ROOT, 0, STRING.length() - 1 );
    }
    private static boolean is( final char[] WORD, final Dictionary BRANCH, final int INDEX, final int INDEX_LIMIT ) {
     Dictionary branch = null;
     switch ( WORD[ INDEX ] ) {
     case 'A' : branch = BRANCH.A; break;
     case 'B' : branch = BRANCH.B; break;
     case 'C' : branch = BRANCH.C; break;
     case 'D' : branch = BRANCH.D; break;
     case 'E' : branch = BRANCH.E; break;
     case 'F' : branch = BRANCH.F; break;
     case 'G' : branch = BRANCH.G; break;
     case 'H' : branch = BRANCH.H; break;
     case 'I' : branch = BRANCH.I; break;
     case 'J' : branch = BRANCH.J; break;
     case 'K' : branch = BRANCH.K; break;
     case 'L' : branch = BRANCH.L; break;
     case 'M' : branch = BRANCH.M; break;
     case 'N' : branch = BRANCH.N; break;
     case 'O' : branch = BRANCH.O; break;
     case 'P' : branch = BRANCH.P; break;
     case 'Q' : branch = BRANCH.Q; break;
     case 'R' : branch = BRANCH.R; break;
     case 'S' : branch = BRANCH.S; break;
     case 'T' : branch = BRANCH.T; break;
     case 'U' : branch = BRANCH.U; break;
     case 'V' : branch = BRANCH.V; break;
     case 'W' : branch = BRANCH.W; break;
     case 'X' : branch = BRANCH.X; break;
     case 'Y' : branch = BRANCH.Y; break;
     case 'Z' : branch = BRANCH.Z; break;
     }
     if ( branch == null ) return false;
     if ( INDEX == INDEX_LIMIT ) return branch.terminus;
     else return Dictionary.is( WORD, branch, INDEX + 1, INDEX_LIMIT );
    }
}
+16  A: 

Don't worry about performance; use the syntax that best expresses what you're doing. Only after you have (a) demonstrated a performance deficiency; and (b) localized it to the routine in question, only then should you worry about performance. For my money, the case syntax is more appropriate here.

Carl Manaster
+1  A: 

The switch statement should use a hash to select which case to go to. From there, every subsequent case will also be run if there are no break statements. For example, with your code, if you switch on X, it will immediately go to X, then Y, then Z. See the Java Tutorial.

David Johnstone
+2  A: 

Honestly, I don't think the performance matters in this case. It's really up to the compiler and its optimization.

Cambium
+3  A: 

If you have a switch statement with consecutive, integral values, depending on the language, it may be optimized to a branch table, which is very quick. It's not slower, anyway!

Additionally, using if/else if would be an improvement over if, for cases such as this one in which your cases are mutually exclusive. No sense making 25 more checks after matching A.

But basically, any performance difference is negligible, and you ought to use the most correct syntax, which in this case is the switch statement. Make sure to separate your cases with breaks though.

Nick Lewis
+10  A: 

In bytecode there are two forms of switch: tableswitch and lookupswitch. One assumes a dense set of keys, the other sparse. See the description of compiling switch in the JVM spec. For enums, the ordinal is found and then the code continues as the int case. I am not entirely sure how the proposed switch on String little feature in JDK7 will be implemented.

However, heavily used code is typically compiled in any sensible JVM. The optimiser is not entirely stupid. Don't worry about it, and follow the usual heuristics for optimisation.

Tom Hawtin - tackline
@TomHawtin: Cheers Tom. Your comments, as always, are worth their weight in salt. I was searching for some reason for the jump in speed when timing my methods. re:switching on a string http://blog.bpsite.net/item/71/Switch%20on%20String%20in%20Java.html I found that to be useful reading, I once wrote a piece of code which metacoded a enum version from a string version written in a java idiomatic way. Maybe that is how it will be implemented ?
_ande_turner_
+3  A: 

Not quite an answer to your question, but there's actually a bug in your code, you should have a break after each case:

switch ( WORD[ INDEX ] ) {
    case 'A' : branch = BRANCH.A; break;
    /* B through to Y */
    case 'Z' : branch = BRANCH.Z; break;
}

I don't think performance differences are going to be too significant here, but if you really care about performance, and if this code is executed very frequently, here's another option:

// Warning, untested code.
BRANCH[] branchLookUp = {BRANCH.A, BRANCH.B, ..., BRANCH.Z};

branch = branchLookUp[WORD[INDEX] - 'A'];

Be sure to encapsulate this and document it well though.

Jack Leow
@JackLeow: Fixed the omitted breaks, was a transcription omission.
_ande_turner_
+1 for suggesting using an array. You should consider storing the BRANCHES in array instead of having 26 static fields.
notnoop
A: 

I think this is more of a question about style rather than performance. I think in this case switch statement is more appropriate than if. I'm not sure there is much difference in performance.

Quincy
+5  A: 

It looks like you have enumerated the values so perhaps an enumeration is in order?

enum BRANCH {
  A,B, ... Y,Z;
}

Then in your code :

BRANCH branch = BRANCH.valueOf( WORD[ INDEX ] );

Also, there is a possible bug in your code where "A" == "A" may be false depending on the object identity of the "A"'s.

Clint
+1  A: 

The switch should be logarithmic and the if linear, assuming the compiler can't find anything clever. But long switches are tricky to read and also bug-prone -- as noted, the switch you've got above doesn't have any breaks, and it's going to fall through all the cases.

Why not prepopulate a Map instead, and just use Map.get()?

private static final Map<Char, Whatever> BRANCHES = Collections.unmodifiableMap(new HashMap<Char, Whatever>() {{
    put('A', BRANCH.A);
    ...
    put('Z', BRANCH.Z);
}}

public void getBranch(char[] WORD, int INDEX) {
    return BRANCHES.get(WORD[INDEX]);
}

As noted above, if BRANCH is an Enum, this behavior should properly be in the Enum.

(What are WORD, INDEX and BRANCH here, anyway? From the names, they should be constants, but you can't really have constant arrays -- the contents are alwyas modifiable; there wouldn't be much point in creating a constant "struct"; and there certainly wouldn't be much point in iffing or switching based on constants....)

David Moles
+2  A: 

Here's a way to avoid all of the case statements.

import java.util.HashMap;

public class Dictionary {
    private static Dictionary      ROOT;
    private boolean         terminus;
    private final HashMap<Character, Dictionary> dictionaries = new HashMap<Character, Dictionary>();

    private void ensureBranch(char c) {
     if (getBranch(c) != null)
      return;
     dictionaries.put(c, new Dictionary());
    }

    private Dictionary getBranch(char c) {
     return dictionaries.get(c);
    }

    public static boolean is(final String string) {
     ensureRoot();
     return is(chars(string), ROOT, 0, string.length() - 1);
    }

    public static void add(final String... strings) {
     ensureRoot();
     for (final String string : strings)
      add(chars(string), ROOT, 0, string.length() - 1);
    }

    private static void ensureRoot() {
     if (ROOT == null)
      ROOT = new Dictionary();
    }

    private static char[] chars(final String string) {
     return string.toUpperCase().toCharArray();
    }

    private Dictionary() {
     this.terminus = false;
    }

    private static void add(final char[] word, final Dictionary dictionary, final int index, final int limit) {
     Dictionary branch = getBranch(word, dictionary, index);
     if (index == limit)
      branch.terminus = true;
     else
      add(word, branch, index + 1, limit);
    }

    private static Dictionary getBranch(final char[] word, final Dictionary dictionary, final int index) {
     final char c = word[index];
     dictionary.ensureBranch(c);
     return dictionary.getBranch(c);
    }

    private static boolean is(final char[] word, final Dictionary dictionary, final int index, final int limit) {
     Dictionary branch = dictionary.getBranch(word[index]);
     if (branch == null)
      return false;
     if (index == limit)
      return branch.terminus;
     return is(word, branch, index + 1, limit);
    }
}
Carl Manaster
@CarlManaster: :D
_ande_turner_
@CarlManaster: I did some experimentation. This adds about a 30% increase in memory usage, with a slightly longer time for adding new words, and a slightly quicker verification of entries.
_ande_turner_
@Ande: Thanks for letting me (and all of us) know; I'm glad you're taking an empirical approach - and unsurprised that you're finding small differences in speed. I find the increased memory usage slightly surprising; you could probably find an appropriate initial capacity for the hashmap to bring that down a bit.
Carl Manaster
@CarlManaster: It came to 18410184 bytes used in the Original, 25296416 bytes used using Hashmap. I was using the "dictionary.txt" available from Sun in their tutorials which was 622783 bytes on disk.
_ande_turner_
A: 

I know this is not what you are asking at all, but aren't you just doing this?

public class Dictionary {
    private static final Set<String> WORDS = new HashSet<String>();

    public static void add(final String... STRINGS) {
        for (String str : STRINGS) {
            WORDS.add(str.toUpperCase());
        }
    }

    public static boolean is(final String STRING) {
        return WORDS.contains(STRING.toUpperCase());
    }
}

Are you simply looking for something a little more memory efficient?

Jack Leow
@JackLeow: Was bored ... And as for memory efficient, this version of yours wins. Perhaps this should become a golfing question.
_ande_turner_