views:

68

answers:

4

I'm looking for some kind of reference which shows the frequency of symbols of popular programming languages. I'm trying to design an optimal keyboard layout for programming.

If there is no such reference, I wouldn't mind creating a simple utility that figures this out. However, I would need suggestions as to which files to analyze for each language.

One of the problems I can foresee is say I get some objective-c code, if it is a simple program with no objects, then the [ and ] keys will be far less frequent than an average objective-c file. So, I would say one of the guidelines is that the sample code should be representative of an average file and use the most commonly used features of the language.

Originally I was thinking that I should get the same code written in different languages, but I'm not sure if that's a good idea since some languages have different uses than others.

A: 

Their is a version of the Dvorak keyboard layout available, optimized for programmers.

http://www.kaufmann.no/roland/dvorak/

If you happen to use Ubuntu, it is already on your system.

André
Yup, that's exactly the keyboard I am customizing. I don't like how the equals sign is hard to reach.
Senseful
Really? just stretch your index finger. Works for me ;)
André
@Andr I use the pinky ...
NullUserException
A: 

There's a vast collection of open-source software that you could measure to gain some good data on character frequency. Sourceforge and github would be the places to look.

Developers don't just write code though, they also write design documents, emails and answers to stack overflow questions. Maybe installing a key logger on a few consenting developers computers would be the best way.

Daniel
A: 

For large code samples to use for statistical analysis, you might try browsing popular open-source projects or searching on Koders by language.

I made some simple changes to a QWERTY layout a few years ago, and I've been using it ever since as my general-purpose layout:

  • Swap digits for their corresponding shift-symbols.
  • Swap _ and -: names with underscores are common, and now - and + both require Shift.
  • Swap [] and {}: blocks are more common than subscripts.

Plus two optional changes, to taste:

  • Swap ` and ~: destructors are common.
  • Swap ' and ": strings are more common than characters.

The last is the only one that typically would interfere with typing ordinary English text. The layout works beautifully for C++, Perl, and whatever else I've used in the past two or three years. The noticeable speed increase comes from the drastic reduction in the need to hit the Shift key. I find that using Shift for the numbers isn't a big deal since the number pad is usually faster anyway.

Jon Purdy
A: 

What you're looking for is a good corpus of programming languages. While nothing immediately sprung up in a cursory Googling, the following links might hopefully prove to be useful if you do create your own tool.

A novel framework to detect source code plagiarism

Calgary Corpus

Generating an NLP Corpus from Java Source Code

A Computer Science Text Corpus/Search Engine X-Tec and Its Applications

Mining search topics from a code search engine usage log

Sedate Alien