views:

49

answers:

2

I need to find a list of commonly mistyped keys on a keyboard for a project I am working on. Basically I need to know what key a user is trying to press and what key they are actually pressing and a comparative measure of how often this happens.

By "comparative measure" I mean that I would like to be able to say that knowing a user mistyped the "c" key, that it is more likely that they hit the "x" key vs the "v" key (basically the "commonness" column below).

My ideal list would be something like below to give you an idea of what I'm looking for.

Target Key    Actual Key   Commonness...
----------    -----------  -------------
v             c            100
v             b            95
c             x            100
c             v            90

And so on...

Has anyone come across any reputable sources out there that have anything that might provide this information? I have had no luck so far...

A: 

I don't know of a statistics source, but it seems there would be a big difference between (1) someone hitting the wrong key because of poor finger positioning, which most typists would immediately backspace and correct on the fly, so statistics on those kinds of events could only be captured in real time as opposed to tabulating what most spelling correctors encounter, and (2) the typist hits the right keys but in the wrong order ("naem" instead of "name") because of speed/distraction/neuron causes, and (3) the typist hits the wrong keys from not knowing how to spell ("maintenence" instead of "maintenance").

For case #1, if the most common letters in English are E, T, A... then there's probably a good chance those are also the most missed keys, in that order, although that doesn't tell you which of the neighbors like "w" and "r" are hit the most instead. A typist trying for an end-of-row key like "a" might actually wrongly hit CAPS LOCK as frequently as wrongly hitting "s".

Personally, it's the non-alphas I usually miss, especially if hunting and pecking for / vs \, { vs [, ' vs ", comma vs period when typing formatted numbers and currency, missing the shift and getting 8 instead of *, etc, etc, and since non-alpha typing is so prevalent when programming, those cases are probably much more frequent for programmers than non-programmers.

joe snyder
Interesting. While I do have trouble with the non alphas I would say that the among the alphas it's x,c,v that I have the most trouble with rather than e,t,a. I suspect that while these might be the most common letters typist aren't likely to hit the wrong key when typing them because of their placement and how commonly they are used. Do let me know if you find any reputable statistics on this.
Abe Miessler
+1  A: 

I actually had to look into this a couple of years ago--when i began the project i had no idea where to begin, so hopefully i can save you an anyone else in the same situation, some time.

Bottom line is that you can take advantage of a large amount of work done in other fields. The most important of these, i found, is the domain name registrars. For instance, DomainTools has a 'Domain Typo Generator', which works by generating a list of 'typo' domain names, from a domain name your enter.

In addition, i would recommend the remarkably comprehensive 2005 study of this issue by Microsoft Research.

Finally, there's a key concept in computational linguistics derived from the Levenshtein distance, called Damerau-Levenshtein distance, which extends the basic Levenshtein's basic idea of 'edit distance' to the particular problem of humans typing on a keyboard. The principal conclusion from his 1964 research paper was that 80% of all typos can be described by one of just four operations--insertion, deletion, substitution of a single character, or transposition of two characters.this problem was Damerau not only distinguished these four edit operations but also stated that they correspond to more than 80% of all human misspellings. (The only link i supplied for D-L is the Wikipedia article; i did so because i think this is an exellent and brief introduction plus it contains pseudo-code for the D-L algorithm, and finally the article provides links the primary online sources for D-L.

doug
Awesome info, thanks!
Abe Miessler