views:

388

answers:

9

Hi,

Following on from this question, I am interested in finding out how you could measure the popularity of any and all programming languages.

As professional developers, we need to be aware of the trends in the software industry - what languages will employers be looking for in the coming few years, and we should be proficient in. Also, it can allow us to spot opportunities - perhaps there are opportunities for new developers to branch out into mainframe programming as older members of the profession retire. For this reason, it is important for us to track programming language popularity.

There are number of questions already on Stack Overflow (here and here) about how SO could be used to measure a language's popularity (or the difficulty in using said language). Other methods include tracking job adverts (i.e. http://www.hotskills.net/) and search engine query statistics (i.e. http://langpop.com/).

Can the SO community think of any other methods of measuring this?

Summary

  • Use Stack Overflow tags to measure language popularity
  • Search Engine query statistics
  • Job adverts
  • Open Source code repositories

As noted by various contributors below, each of the above sources has problems as a reference to calculate language popularity/usage.

A: 

Open source contributions perhaps.

Zoidberg
I think freshmeat.net statistics are already used by the http://langpop.com/ site, but you could also look at using SourceForge statistics, and other open source repositories. +1
MagicAndi
+4  A: 

I'd say a language popularity and success is exponential to the number of people who hate it.

Developer Art
+4  A: 

Not voting the question down, because a lot of people ask about this kind of thing. However...

The next words out of anyone's mouth after this is asked should be, "Popular with who?".

Popular is a useless word to apply to programming languages. There is no universally accepted meaning of it, so there's objective way to measure it.

For example, the obvious thing to do would be to go out and count up worldwide deployed LOC in every software project in use. When you do that, you'd discover that hands-down the most popular language is Cobol.

Someone else might think the obvious way to measure would be by Google hits. Doing that, they'd find that Java gets 282 million results, while C# gets 48 million, and Cobol only gets 6.5 million. So clearly Java is more popular than C#, and way more popular than Cobol.

A third person might think the obvious way to check is to look at SO tags. They'd find the single most used tag here is C# (34K uses so far). Cobol only has been used 65 times here. So clearly C# is the most popular, and almost nobody uses Cobol.

So who is right? All three are. It depends on what you really meant when you asked the question.


For those who are surprised at my Cobol assertion, I suggest reading this (somewhat dated 2003) article on the subject. It will be a real eye-opener. It could be argued that we non-Cobol programmers are all working around the margins of a gigantic Cobol world.

T.E.D.
T.E.D. Valid point - I should be defining the metric used to measure the language usage in order to have comparable results. I do question whether Cobol would be the most popular programming language (surely C would give more LOC?) - can you give any references to back this claim up? Thanks.
MagicAndi
Nope. Cobol has the largest worldwide installed codebase. There is an immense amount of it running out there. The best number I could find was a 5 year old number that said there are 180 *billion* lines in use, with 5 billion more being written every year (that would work out to over 200 billion today).
T.E.D.
@T.E.D., the problem with that article is that it asserts the size of the Cobol installed base with the citation of a Gartner '97 report without a title, a link or a summary of the methodology used in the report. At this point, I'd have to say that that article is very suspect: it's presenting second or third-hand information without context.
Bob Cross
I'd agree in general. Even if the info is good, '97 is waaaaay out of date now. However, there simply aren't any other good numbers out there to find, so we do the best we can with what we have.
T.E.D.
(1) I agree that the Cobol numbers are too high. A large percentage of the mainframes used for much of that code have been retired. Also, I assume we are talking about current popularity, i.e. code written this week or this year, not total lines of code ever written. (2) One additional area to consider is educational institutions, who sometimes use different (and often better) languages than the profit-constrained software industry.
xpda
Perhaps the sum of all the programs written in modern programming languages is greater than the programs that are being written in COBOL. If the numbers of results in Google and SO, and the amount of people that know COBOL today is less than those of modern programming languages it is possible to assume that COBOL is not that much popular anymore.
Partial
I'm seeing a lot of speculation, but the only actual *numbers* we have show huge Cobol use.
T.E.D.
For clarity, I do personally know COBOL coders - they work at Bank of America. The financial world seems to be the world where COBOL definitely thrives.
Bob Cross
Jeff Atwood recently commented on COBOL and the supposed abundance of legacy COBOL code on his Coding Horror blog - http://www.codinghorror.com/blog/archives/001294.html. Personally, I view this claim of COBOL being the largest codebase in the world as an urban myth...
MagicAndi
You can view it however you want Andi, but without different numbers, its just a daydream.
T.E.D.
@T.E.D., The Register has an article celebrating COBOL's 50th birthday (http://www.theregister.co.uk/2009/09/18/cobol_name_birthday/), and it references a DataMonitor report (http://www.microfocus.com/000/COBOL_continuing_to_drive_value_in_the_21st_Century_tcm21-23652.pdf) [PDF] that quotes that 220 Billion lines of COBOL code are used in live systems. It also repeats the 5 billion lines claim. Unfortunately, the report merely says the statistics were provided by IBM, so I can't tell if this is merely a rehash of figures from old reports, or if a genuine survey has been carried out.
MagicAndi
@T.E.D. - Just read this article, A Short History of Lines of Code (LOC) Metrics (http://www.gilb.com/tiki-download_file.php?fileId=187) [Word], and discovered that IBM used to measure programmer productivity using KLOC, how many thousand lines of code they wrote. The reason for the huge figure for COBOL lines of code becomes apparent...
MagicAndi
Pretty much every place I've ever worked used that same metric (for non-Cobol code). You aren't comming up with actual numbers here dude, just justifications for your disbelief in them.
T.E.D.
Just to confirm, I have contacted the DataMonitor report author, and it appears that the figures are simply a rehash of the same old Gartner Group report figures from 1997.
MagicAndi
A: 

number of posts about that programming language on stack overflow

Daniel
Wouldn't that be a better measure of how difficult a language is? ;)
Eevee
not really, you have more people using a language youll have more questions on it no?
Daniel
100 people just starting to use a poorly-designed, poorly-documented language will ask more questions about it than 1000 people who have spent the last 5 years using an elegant and well-documented language.
Dave Sherohman
+1  A: 

You check the tiobe statistic

Stefano Borini
Thanks for the great link, that's an interesting resource.
shanabus
An interesting, but deeply flawed resource using methods very susceptible to gaming. I don't have space to go into all the issues here, but http://www.google.se/search?q=tiobe+methodology+flawed will take you to many good writeups on them.
Dave Sherohman
@Dave: Lies, Damn Lies, and Statistics.
Stefano Borini
A: 

You can use Google Trend to have an idea. Of course it's not very accurate since you can write "C#" or "C Sharp" but it can give you a brief idea.

Daok
+1  A: 

What does "popular" mean? Here are some potential ways of measuring it:

  1. The number of developers writing with that language professionally at a given point in time.
  2. The number of people frequently experimenting with or using the language at home at any given point in time.
  3. The number of developers who wish they were using language X (or are happy that they are).

Problems with some measurements:

  • Using SO questions or Google hits could merely indicate which language (among those in the running for most popular) is the hardest to use.
  • Counting job adverts would be horribly inaccurate, since people tend to switch to things that don't fall into their original job description, and you would miss all the people currently using a language (not applying for a job).

Personally, I'd like to use number 3 as a measurement of popularity, but I have no idea how you would measure it. The internet would seem like a good place, but which site will be able to attract all the developers, and how would you know that enough of them responded to the poll?

John Fisher
Someone tried to measure #3 last May using percentages of positive vs. negative comments on Twitter: http://blog.doloreslabs.com/2009/05/the-programming-language-with-the-happiest-users/
Dave Sherohman
The problem with that is people. There are many reasons people would decide to comment, but most of them are unrelated to accurately determining whether a language is popular. Just think about the developers you know. Would they be more likely to comment about a frustration they encountered while coding, or would they comment about how easy it was to do something they do every day?
John Fisher
+4  A: 

As the author of http://www.langpop.com my approach is to find as many metrics as possible (certainly not limited to just search engine results! We have books, job listings, irc, google code, freshmeat and others) and let people see the methodology, making the whole thing as transparent as possible. That's why I added the javascript feature that lets you recalculate the normalized results with different weights for each metric.

As someone else notes, there are many different ways of measuring popularity. Another important one that he doesn't mention might be the "acceleration" of a given language: for instance, Cobol has a big installed base, but I don't think a lot of new Cobol projects are being started. Something like Ruby is probably the opposite - it's not widely used, but a lot of people are picking it up for new projects.

I disagree with the conclusion that the numbers are "meaningless", though. By looking at the different measurements and thinking about them some, I think there are plenty of interesting conclusions to be drawn. Also, don't confuse "rough" numbers with "useless" numbers. I think we can definitely say that Java is more popular than Tcl, for instance.

David N. Welton
David, Accepting your answer as it covers the range of different metrics that we can use to track language popularity. Thanks for creating your site, it provides some extremely interesting reading!
MagicAndi
A: 

This blog article neatly summarizes the various ways of determining the popularity of a programming language:

The article describes one way of measuring popularity that has so far not been mentioned:

In terms of ways that have been mentioned, the article offers specific ways of gathering statistics:

  • Measured by Commits to Open Source projects - use of the Ohloh website.
  • Popularity by Lines of Code - use of figures compiled by BlackDuck
MagicAndi