views:

224

answers:

3

I am looking for a reliable source that would provide some statistics on what keys are the most frequently mistyped.

For example: is "a" and "s" more commonly mistyped compared with "m" and "n"?

if yes what are the underlying data i.e. "a" is mistyped instead of "s" when the previous letter is "o" in 25% of cases but contrary "s" is mistyped more often instead of "a" when the previous character is "r." Etc.

+1  A: 

I suggest the following: Use Interop to connect to MS Office and export the list of autocorrect entries. This will give you a very good list of the common typos made (at least the ones that Microsoft thinks are common.)

Then, look for differences between misspelled words and their corrected counterparts. This should, in theory, give you some of the data you're looking for. Likely many of the pairs are words where some letters are reversed: liek and like, teh and the are common ones that come to mind - this might not be what you're looking for, as the correct keys are pressed, but in the wrong order.

I suspect the statistics are influenced by the skill of the keyboard operator: professional typists likely make fewer errors than say, the average grade 8 student.

Here's some information about connecting to MS Word.

EDIT:

I exported the list myself, and Word has almost 1000 entries, although many of them include corrections for punctuation (we;re instead of we're) and might not be what you're looking for and is far from exhaustive.

Charlie Salts
A: 

http://norvig.com/ngrams/ has some data, collected from Wikipedia and Roger Mitton. Mitton's non-Wikipedia data appears to come from handwritten sources, though, while the Wikipedia data is a list of common mispellings, not distinguishing typos from spellos. As Charlie Salts comments, it includes a frequency table derived from this data (though the Wikipedia common-misspellings data doesn't weight them by frequency).

I have some similar additional data: about 1000 spelling errors collected from my regular online reading (comments on blog posts, etc.) as I came across them. If you think you can use this, let me know. (Like you I searched online for a confusion matrix like you're asking for, before giving up and collecting this inadequate corpus myself.)

Darius Bacon
This one seems to be the most useful: http://norvig.com/ngrams/count_1edit.txt
Charlie Salts
A: 

Typo Popularity Tracking with Google may not be the best source, but it is a good source.

I imagine that some kinds of typos depend on the keyboard layout used (missed keys), and that others (misspellings, homonyms) don't. So I would also look at commonly misspelled word lists.

Chip Uni