views:

745

answers:

7

UPDATE:

The final version of my utility looks like this:

StringBuilder b = new StringBuilder();

for(char c : inLetters.toLowerCase().toCharArray())
{
 switch(c)
 {
 case '0':                                          b.append("0"); break;
 case '1':                                          b.append("1"); break;
 case '2': case 'a': case 'b': case 'c':            b.append("2"); break;
 case '3': case 'd': case 'e': case 'f':            b.append("3"); break;
 case '4': case 'g': case 'h': case 'i':            b.append("4"); break;
 case '5': case 'j': case 'k': case 'l':            b.append("5"); break;
 case '6': case 'm': case 'n': case 'o':            b.append("6"); break;
 case '7': case 'p': case 'q': case 'r': case 's':  b.append("7"); break;
 case '8': case 't': case 'u': case 'v':            b.append("8"); break;
 case '9': case 'w': case 'x': case 'y': case 'z':  b.append("9"); break;
 }
}

return builder.toString();


ORIGINAL QUESTION:

I'm taking on the simple task of converting an alphanumeric phone number to a string of digits. For example, 1-800-HI-HAXOR would become 1-800-44-42967. My initial attempt was to create a nasty switch statement, but I'd love a more elegant, and efficient solution. Here's what I've got:

for(char c : inLetters.toLowerCase().toCharArray())
{
    switch(c)
    {
    case '0':                                         result+="0"; break;
    case '1':                                         result+="1"; break;
    case '2': case 'a': case 'b': case 'c':           result+="2"; break;
    case '3': case 'd': case 'e': case 'f':           result+="3"; break;
    case '4': case 'g': case 'h': case 'i':           result+="4"; break;
    case '5': case 'j': case 'k': case 'l':           result+="5"; break;
    case '6': case 'm': case 'n': case 'o':           result+="6"; break;
    case '7': case 'p': case 'q': case 'r': case 's': result+="7"; break;
    case '8': case 't': case 'u': case 'v':           result+="8"; break;
    case '9': case 'w': case 'x': case 'y': case 'z': result+="9"; break;
    }
}

Thanks!

A: 

Switch statements get compiled to a similar form as if-else statements, (each case statement is essentially an if (c == '...') test in disguise) so although this is visually more compact than cascading if's that test for each character, there may or may not be any real performance benefit.

You can potentially streamline it by eliminating some of the comparisons. The key is that char is an integer type (which is why you can switch on a char) so you can use numeric comparison operators. and 'aAssuming your inLetters string only contains alphanumeric characters, this should work... (All other characters will pass through unchanged.)

String result = "";
for (char c : letters.toLowerCase().toCharArray()) {
    if      (c <= '9') result += c;
    else if (c <= 'c') result += "2";
    else if (c <= 'f') result += "3";
    else if (c <= 'i') result += "4";
    else if (c <= 'l') result += "5";
    else if (c <= 'o') result += "6";
    else if (c <= 's') result += "7";
    else if (c <= 'v') result += "8";
    else if (c <= 'z') result += "9";
    else               result += c;
}

The characters of interest have the hexadecimal values: '0' = 0x30, '9' = 0x39, 'a' = 0x61, and 'z' = 0x7a.

Edit: It's better practice to use a StringBuilder and append() to create the string, but for small strings it's not likely to be appreciably faster. (Amdahl's Law demonstrates that the actual speedup you can expect from optimizing code is limited by the percentage of time actually spent in that code.) I only used concatenated strings to make the algorithm clear to the OP.

Quinn Taylor
This is wrong. Switch statements are compiled to an offset-based jump, or a binary search tree, depending on how dense the constants are.
erickson
The particulars can depend a lot on the compiler and VM. I don't care to tousle over bytecode — regardless of how the machine instructions are laid out, the real point is that reducing the number of comparisons will result in faster code. Stepping back a level from execution efficiency, this approach is arguable a "more elegant" solution. (Depends on how well one understands char variables, of course.)
Quinn Taylor
Honestly, the original is more readable to me. I don't have to think at all about it. In order to understand this, I need to recite the alphabet to determine which clause a particular character comes under. I prefer the switch statement for clarity.
tvanfosson
That is true, both can be a factor. I'm referring to the Sun JDK's `javac`, and the HotSpot VM. Compilers circa 1995 didn't use `tableswitch`, but I'm not familiar with any that don't take advantage of it nowadays. Use of `tableswitch` is O(1), but I'm not sure if it would be used here or not.
erickson
Excellent point. Since switch statements are effectively `==` comparisons, a table lookup would very likely be faster than '<=" tests. It would be interesting to see exactly how the original switch looks when compiled...
Quinn Taylor
Nice job, especially for your final edit. Those little "+=" operators are going to consume way more time than the whole rest of the code.
Mike Dunlavey
+8  A: 

The switch statement is not really that bad. Your algorithm is linear with respect to the length of the phone number. The code is readable and pretty easy to verify by inspection. I wouldn't mess with it, except to add a default case for handling errors. (I'm not a Java programmer, so forgive me if it's called something else.)

If you have to make it faster, a pre-initialized table indexed by character would avoid any comparisons beyond basic error checking. You could even avoid the case conversion by duplicating the values in the table (digit['A'] = digit['a'] = "2";). The cost of initializing the table would be amortized over the total number of conversions.

Adrian McCarthy
That would work too, but I'd hesitate to prematurely optimize in such a way. Although the **time** is amortized over all conversions, the **space** is persistent, and probably a complete waste of effort for the amount of time savings you could hope to gain.
Quinn Taylor
I like it. It's easier to read, and performs well too. I don't think the 100 bytes or so the table would require would be a show-stopper in many environments.
erickson
Oh, I wouldn't consider it a show stopper in almost *any* environment. I'm just against unnecessary bloat in small instances, because they add up. I find that programmers that are willing to spend 100 bytes here and there for objects persistent for the life of the program will end up scratching their heads as to why their program ends up using too much memory, and have very little idea how to reduce their footprint. It's still true that CPU registers are far faster than RAM, and loading something into memory for convenience can hurt performance of other parts of the app. Hence, measure first.
Quinn Taylor
You seem to be ignoring the storage space required for the switch statement, which is roughly equivalent to the space required by the array (leaving out the decision to include both upper and lower case or not).
Nick Johnson
+3  A: 

Use a Map, where the keys are the letters and digits, and the value is the number on the keypad. (So each keypad number will be indexed by three or four letters and one digit).

Map<Character, Character> keypad = new HashMap<Character, Character>();
...
StringBuilder buf = new StringBuilder(inLetters.length());
for (int idx = 0; idx < inLetters.length(); ++idx) {
  Character ch = keypad.get(inLetters.charAt(idx));
  if (ch != null)
    buf.append(ch);
}


Update: I was curious whether a hand-coded lookup table would perform better than a dense set switch cases. In my casual testing, I found the following code to be the fastest I could come up with:

  private static final char[] lut = 
    "0123456789:;<=>?@22233344455566677778889999[\\]^_`22233344455566677778889999".toCharArray();

  private static final char min = lut[0];

  String fastest(String letters)
  {
    int n = letters.length();
    char[] buf = new char[n];
    while (n-- > 0) {
      int ch = letters.charAt(n) - min;
      buf[n] = ((ch < 0) || (ch >= lut.length)) ? letters.charAt(n) : lut[ch];
    }
    return new String(buf);
  }

Surprisingly, it was more than twice as fast as similar code using a switch statement (which compiled to a tableswitch instruction). This was just for fun, mind you, but on my laptop, running in a single thread, I could convert 10 million 10-letter-"numbers" in about 1.3 seconds. I was really surprised, because as I understand it, a tableswitch operates in essentially the same way, but I expected it to be faster since it is a JVM instruction.

Of course, unless I were getting paid only for each of a limitless supply of phone numbers I could convert, I would never write code like this. A switch is much more readable, performs well as-is, and is likely to get a free performance boost in some future JVM.

Far and away, the greatest improvement to the original code comes from using a StringBuilder instead of concatenating strings, and that does nothing to impair readability of the code. Using charAt instead of converting the input to a char[] also makes the code simpler and easier to understand and improves performance too. Finally, appending char literals instead of String literals ('1' rather than "1") is a performance improvement that aids readability a little too.

erickson
This approach is similar to @Adrian's, but also involves invisible unboxing from Character to char, and really doesn't save much in efficiency.
Quinn Taylor
It's designed for clarity and ease of maintenance, not for raw efficiency... although pressed for a guess, I'd hazard it's faster than cascaded if-elses. Adrian's solution should give the best performance, and better clarity than a switch or if-else.
erickson
It's certainly possible that it's comparable or even faster (I haven't measured) but consulting a map *does* require loading it into cache (which is fast but not free) and calling hash() on at least 2 Character objects. CPU/JVM branch prediction is also quite fast, so they're probably very close in practice.
Quinn Taylor
+2  A: 

If you want a solution that doesn't force you to enumerate all of the letters, you could do something like:

char convertedChar = c;
if (Character.isLetter(c)) {
    //lowercase alphabet ASCII codes: 97 (a)-122 (z)
    int charIndex = ((int)c) - 97;
    //make adjustments to account for 's' and 'z'
    if (charIndex >= 115) { //'s'
        charIndex--;
    }
    if (charIndex == 121) { //'z'-1
        charIndex--;
    }
    convertedChar = (char)(2 + (charIndex/3));
}
result += convertedChar;
RMorrisey
+6  A: 

You could do this using the Apache Commons Lang StringUtils, as follows:

String output = StringUtils.replaceChars(StringUtils.lowerCase(input),
                    "abcdefghijklmnopqrstuvwxyz",
                    "22233344455566677778889999");

Assuming speed is not your main concern, of course, and you want a compact solution ;)

grkvlt
Since when does "efficient" not imply speed?
Quinn Taylor
This is a really cool solution! It's definitely compact, but I'm looking for something a bit more efficient. I don't think I want to load up the Apache Commons jar for this conversion.
Reed Olsen
@Quinn efficient could imply less amount of code. In this context elegant and efficient would go together better that way. Efficient could also imply memory. Given that the OP didn't use a StringBuilder, speed didn't seem to be the top priority. Anyway, depending on what StringUtils does, the speed could be fine, and the memory footprint doesn't need to increase, it could do its job with primitives and the references passed in.
Yishai
@Reed, you could copy the method out if you don't want the whole project - it is open source after all.
Yishai
The OP may not have been aware of StringBuilder, and although using it is good practice, I wouldn't consider it critical for something the length of a phone number. :-)
Quinn Taylor
+4  A: 

How about simply:

   String convert(String inLetters) {
      String digits = "22233344455566677778889999";
      String alphas = "abcdefghijklmnopqrstuvwxyz";
      String result = "";
      for (char c : inLetters.toLowerCase().toCharArray()) {
          int pos = alphas.indexOf(c);
          result += (pos == -1 ? c : digits.charAt(pos));
      }
      return result;
   }
JRL
Except I would use a StringBuilder for the translated result
camickr
Both less efficient and less readable, IMO.
Nick Johnson
+1  A: 

If you run this 10^9 times in a tight loop and ctrl-break it a few times, my bet is that nearly every time it will be deep in the string class trying to accomplish one of those innocent-looking "+=" operators.

Mike Dunlavey
Very true - in the actual implementation, I stuck with the switch statement, but used a string builder instead.
Reed Olsen
Yeah. Even then, I bet the time spent in the switch code is not a large fraction.
Mike Dunlavey