views:

126

answers:

4

This was the main question posed by Greg Wilson's "bits of evidence" presentation. I'm paraphrasing in my question here so please read the presentation for all the details.

I'd also like to know if you disagree with the premise of the question i.e. you think that current development practices actually do reflect evidence.

A: 

Because doing making decisions that go against conventional wisdom is risky.

A manager puts his job on the line when he goes against the accepted ways of doing things. Its a much safer decision to just stick with the wisdom of the crowd.

theycallmemorty
Not only that. Sometimes to please investors you have to greet them with methodologies that are "certified". In this way, they move the responsibility away from them (approving a grant for a "unknown" development method won't sound as nice as approving for a "DoD certified" methodology). Waterfall is one of these famous examples.
Stefano Borini
+4  A: 

Most rigorous empirical studies of programming (by deliberate, designed experiment, not just observation of whatever happens to occur), accounting for all variables that may likely affect the results, would be scary-costly.

For example, just like in experimental psychology but even more so, many such empirical studies (e.g., Prechelt's, as quoted in the presentation) are based on volunteers (and any statistician can tell you that using a self-selected sample totally biases the results and make the whole study essentially useless) and/or students (and might 5, 10 or 20 years' worth of professional experience not make a huge difference to the results -- i.e., can experience be blindly assumed to be irrelevant, so that professionals learn nothing at all from it that might affect the results?).

Finding a representative, random sample would be fraught for most researchers -- e.g., even if you could offer participants $40 an hour, a scary-high amount for most studies of decent size (in terms of numbers of participants in the study and length thereof), you'd be biasing your sample towards unemployed or middle-low salary programmers, a systematic bias that might well affect your results.

You could do it (get a random sample) in a coercion-capable structure -- one where refusing to take part in the study when randomly selected as part of the sample could carry retribution (most firms would be in such a position, and definitely so would e.g. military programming outfits). You might get some grumbling, not-really-willing participants, but that's more or less inevitable. A firm with, say, 1000 programmers, might get a random sample of 100 of them to participate for two days -- that would be enough for some studies though definitely not for many of the most interesting ones among those that were quoted (e.g., about the effects of different phases of the development cycle), and a representative sample of the population of programmers currently employed at the firm.

The cost to the firm (considering fully loaded employee and infrastructure costs) might be something like $100,000. How would the firm's investment get repaid? Unless the study's results can be effectively kept secret (unlikely with so many persons involved, and wouldn't the researchers want to publish?-), "improving programmer productivity" (by possibly changing some practice based on the study) is no real answer, because all of the firm's competitors (those with similar programmer populations and practices, at least) could easily imitate any successful innovation. (I do hope and believe such results would not be patentable!-).

So, studies based on students and/or volunteers, very short studies, and purely observational (which is not the same as empirical!-) ones, are most of what's around. If you're not clear about the difference between observational and empirical: for most of humanity's history, people were convinced heavy objects fall faster, based on observational evidence; it took deliberate experiments (set up by Galileo, to compare falling-speeds while trying to reduce some effects that Galileo couldn't actually deal with rigorously), i.e., empirical evidence, to change opinions on the subject.

This is not totally worthless evidence, but it is somewhat weak -- one set of semi-convincing data points out of many, which decision-making managers must weigh, but only up to a point. Say there's this study based on students somewhere, or volunteers from the net, or even a proper sample of 100 people... from a company that does software completely different from mine and in my opinion hires mediocre programmers; how should I weigh those studies, compared with my own observational evidence based on accurate knowledge of the specific sectors, technologies, and people my firm is dealing with? "Somewhat" seems a reasonable adverb to use here;-)

Alex Martelli
Great points if I may say so since this generally converges with my own independent "study" (Applying similar standard to the opposite view's studies we now have two "studies" to balance the discussion ;-)). I did miss the point on the competitive advantage, and hence the associated secrecy etc. Anyway, glad to see I have a credible "ally" on this topic.
mjv
@mjv, glad you liked it; if you missed the secrecy/competitive advantage issue, I on my part did not mention the possible biases (a firm that finances a study may have a stake in a certain outcome), and hammered on the reproducibility issue only circumlocutorily (touching on sampling biases, the possible value of experience, etc), while you stated it succintly and summarily, so each answer brings some bits lacking in the other;-)
Alex Martelli
+2  A: 

Because...

  • empirical "evidence" is hard to measure and expensive to produce
  • studies which produce such evidence are often tainted with commercial concerns or other particular motives.
  • the idea of systematic reproducibility in the context of software development is in part flawed

Disclosure: The above assertions are themselves the product of my own analysis based mostly on personal experience and little scientific data to boot. ;-) Never the less here are more details that somewhat support these assertions.

Pertinent metrics regarding any sophisticated system are difficult to find. That's because the numerous parts of a complex system provide an even more numerous number of possible parameters to measure, to assert, to compare. It is also, maybe mainly, because of the high level of correlation between these various metrics. SW design is no exception, with thousands of technology offerings, hundred of languages, dozen of methologies and with many factors that lay outside of the discipline proper, effective metrics are hard to find. Furthermore many of the factors in play are discrete/qualitative in nature, and hence less easily subjected to numeric integration. No wonder the "number of lines of codes" is still much talked about ;-)

It is easy to find many instances where particular studies (or indeed a particular write-ups by some "consulting entities") are sponsored in the context of a particular product or a particular industry. For example, folks selling debugging tools will tend to overestimate the percentage of time spend on this function as compared to the overall development time etc. Folks selling profilers...

The premise of having the software development processes adopt methodologies associated with mass production of identical products (it is no coincidence that at least two of the slides featured an automobile assembly line), is, in my opinion, greatly flawed. To be sure, individual steps in the process can and should be automated and produce predictable results, but as a whole, there are too many factors and too few instances of projects/product to seek the kind of rationalization found in mass production industries.

Commentary on Greg's presentation per se:
Generally I found the slides to be a pleasant read, humorous and all, but somewhat leaving me hungry for substance and relevance. It is nice to motivate folks to strive towards evidence-based processes but this should be followed by practical observations in the domain of software engineering to help outline the impediments and opportunities in this area.

I'm personally a long time advocate of the use of evidence-based anything, and I'm glad to live in a time where online technologies, computing power and general mathematical frameworks come together to deliver many opportunities in various domains, including but not limited to the domain of software engineering.

mjv
A: 

Interesting presentation!

It's really hard (and very costly) to run controlled experiments that are large enough and real enough to be compelling to practitioners. You tend to get small experiments involving 20 graduate students over a few hours when we really need to measure teams of experienced developers working for a few weeks or months on the same task under different conditions (see slide 12). And of course the latter studies are very expensive.

While smaller studies can be suggestive, real development organizations can't draw many real conclusions from them. I think instead that the more effective teams mainly learn from experience, a much less empirical process. If something works well enough, it will carry forward to the next project, and if something went wrong, a different approach will be tried next time. Small pilot projects will be undertaken, new technologies will be tried out, and notes are compared with colleagues in other organizations.

So I'd say that (non-dysfunctional) development groups behave more or less rationally, but that they could really use evidence from more ambitious experiments.

Jim Ferrans