Most rigorous empirical studies of programming (by deliberate, designed experiment, not just observation of whatever happens to occur), accounting for all variables that may likely affect the results, would be scary-costly.
For example, just like in experimental psychology but even more so, many such empirical studies (e.g., Prechelt's, as quoted in the presentation) are based on volunteers (and any statistician can tell you that using a self-selected sample totally biases the results and make the whole study essentially useless) and/or students (and might 5, 10 or 20 years' worth of professional experience not make a huge difference to the results -- i.e., can experience be blindly assumed to be irrelevant, so that professionals learn nothing at all from it that might affect the results?).
Finding a representative, random sample would be fraught for most researchers -- e.g., even if you could offer participants $40 an hour, a scary-high amount for most studies of decent size (in terms of numbers of participants in the study and length thereof), you'd be biasing your sample towards unemployed or middle-low salary programmers, a systematic bias that might well affect your results.
You could do it (get a random sample) in a coercion-capable structure -- one where refusing to take part in the study when randomly selected as part of the sample could carry retribution (most firms would be in such a position, and definitely so would e.g. military programming outfits). You might get some grumbling, not-really-willing participants, but that's more or less inevitable. A firm with, say, 1000 programmers, might get a random sample of 100 of them to participate for two days -- that would be enough for some studies though definitely not for many of the most interesting ones among those that were quoted (e.g., about the effects of different phases of the development cycle), and a representative sample of the population of programmers currently employed at the firm.
The cost to the firm (considering fully loaded employee and infrastructure costs) might be something like $100,000. How would the firm's investment get repaid? Unless the study's results can be effectively kept secret (unlikely with so many persons involved, and wouldn't the researchers want to publish?-), "improving programmer productivity" (by possibly changing some practice based on the study) is no real answer, because all of the firm's competitors (those with similar programmer populations and practices, at least) could easily imitate any successful innovation. (I do hope and believe such results would not be patentable!-).
So, studies based on students and/or volunteers, very short studies, and purely observational (which is not the same as empirical!-) ones, are most of what's around. If you're not clear about the difference between observational and empirical: for most of humanity's history, people were convinced heavy objects fall faster, based on observational evidence; it took deliberate experiments (set up by Galileo, to compare falling-speeds while trying to reduce some effects that Galileo couldn't actually deal with rigorously), i.e., empirical evidence, to change opinions on the subject.
This is not totally worthless evidence, but it is somewhat weak -- one set of semi-convincing data points out of many, which decision-making managers must weigh, but only up to a point. Say there's this study based on students somewhere, or volunteers from the net, or even a proper sample of 100 people... from a company that does software completely different from mine and in my opinion hires mediocre programmers; how should I weigh those studies, compared with my own observational evidence based on accurate knowledge of the specific sectors, technologies, and people my firm is dealing with? "Somewhat" seems a reasonable adverb to use here;-)