views:

406

answers:

10

I'm a PhD student in mechanical engineering without an extensive background in computer science. We write code as part of our research, but it's usually high-level (e.g., Matlab) and often rather ad-hoc.

It sounds like code reviews amongst the academics here would be valuable for (a) learning techniques from other people, and (b) spotting inadequacies in your code. (Sometimes I think it's scary that there are papers being published where the theory is instantiated in code that no-one but the author ever looks at!) Should academics have mandatory code reviews with their peers? (Has anyone seen such a thing before?)

More importantly, how long does it take to do an effective code review in this context? It seems hard to justify the extra time when that particular resource is always very scarce.


Addendum: A couple of people have asked whether "publication" constitutes a sufficient review. Not at all, in my field. The results are the important part, so if you write your algorithm and/or data analysis with incomprehensible code that spits out a graph after hours of processing, that's no different than clean, sharable, fast code.

You might write buggy code that produces results that look about right and never know there's a bug in there, though!

But I think it would be a great idea to have publications distributed electronically along with the code that was used to produce the results in the manuscript. Problem, again, is time: everyone's code is ugly and unmaintainable (to generalise; afterall we're engineers, not programmers), so cleaning it up for publication would take too long. Code reviews might help this situation, a little.

+1  A: 

Length of time of code review depends on:

  • Length of code
  • Complexity of code
  • Skill of reviewer

so it is hard to quantify exactly how long it will take. I know some people where I work that can review ~400loc/h but several projects have put a cap on the maximum loc/h rate at 200 as they feel that going faster reduces the quality of the code review.

Also, code reviews are most frequently done against a coding standard as the review is to ensure it fulfills the standard. A code review is not a place to be redesigning code, so if you lack a coding standard then code reviews are difficult to implement.

workmad3
+2  A: 

My advice is to have 1 or 2 sessions and see how the team/peers handle it. The answer to 'is it effective' is that it basically depends on the people.

I've been in many reviews where the folks reviewing didn't actually take the time to sit down and trace through the code. Alot of times people will simply gloss over the source and write down spelling mistakes and 'needs comments here' type of 'defects'. While having this information is nice, it dosen't really help the a,b reasons you listed above.

Note that i've also been in reviews that were very productive.

My last advice is to review small sections of code at a time. Sending out a review for the full project will not yeild the results you expect. Most people will get tired when reviewing alot of code and start to skim over just to make it through everything.

In my latest position, what's worked the best is impromptu code reviews. So, if I have a section of code i've just finished, I call another developer over to take a look prior to checking anything into our source control branch. Nothing formal, just a quick look so someone else knows what was decided with the code.

CR
+1  A: 

Aren't research papers all peer reviewed before publication? And the research itself, isnt there some form of mentor? (Don't know the proper term, I'm not in academia...)

So why doesnt that just include code review too? Peer review on research paper should also be code review. (Truth, few people know how to perform an effective code review, but that's a different issue...)


To clarify, I assumed that peer review (before publication?) will verify all assumptions, confirm conclusions, validate data traces, etc.
Part of validating data traces (and hence the validity of conclusions from that data), must be examining the program that processes or generates that data. Therefore code review is kind of necessitated...

AviD
+1  A: 

Hi Will,

Assuming that your group of reviewers knows the language you're writing in and understands the math/formulas you are using, it generally doesn't take too long. There are published studies and quotes on things such as the number of lines of code to review per hour and others such statistics, but it takes longer to review code that does a bunch of math than it does to do something less math-intensive, particularly if you are rolling your own functions to X or Y.

When I was in school, we didn't do code reviews (except when we were stumped! and then only to find the bug) but it certainly would have been good to get my eyes on more code.

When doing code reviews, my team would typically provide the code to the reviewers a couple of days ahead of time (depending on the volume of code - and break it up a bit into chunks that were logically grouped together if there was a LOT of code) and let the reviewers check it out. Then we'd meet for an hour or so and walk through the findings. There are lots of ways to do it, but that worked reasonably well for us.

I guess it surprised me a bit that there wasn't something like this in place already.

itsmatt
+4  A: 

Having spent a little time in academia I've noticed that a lot of the code produced it just code to get the job done.

Code is usually hacked together by one person to get test whatever hypothosis is being tested. Whereas in Industry code is produced in small teams, with maintainability an important factor.

Unless the reviewer is familiar with the code base, and the ammount of code is very small (A couple of thousand lines?) The code review will either be very time consuming or will not be very strict.

I spent a day Code reviewing a 5000 line program that a colleague had written and yes I found quite a few issues but I'd say I wouldn't be willing to do that too often. In my experience code review works best in pieces evaluating items of functionality rather than an application as a whole.

To be honest, yes code review if done correctly would be a boon to academia, however the chances of finding somone with enough time to review your code is slim in academia as everyone is trying to get thier own project done as quickly as possible. In essence The environment of academia is not really condusive to code review.

And getting Undergrad interns to do it is stupid, Ideally a code review should be conducted by somone with more experience of software development than your average undergrad. So unless the undergrad is exceedingly gifted you aren't likely to get much benifit from thier code review.

Omar Kooheji
A: 

Instead of jumping right into code reviews, it might be a good idea to try some sort of code analyzer. Something like FindBugs is great at pointing out blatant issues that tend to exist in "proof of concept" code.

bcash
+1  A: 

I suppose there is one really big difference: academics and software developers work together in very different ways.

Code reviews are usually done as part of a team with a common agenda. We get paid if the application sells. It's in everyone's interest that they understand all the code and that the code is as good as possible.

Academics have a very different agenda - I'm sure you know at least one PhD student who's constantly foiled in their research by someone senior because it contradicts their paper from 20 years ago (most unis appear to do this a lot).

As a student you may need to choose who you review with carefully, and they won't have an automatic interest in improving your code.

Code reviews don't (often) fix bugs, or really validate the code. They just share understanding of it and provide feedback.

If your code works but is woefully inefficient is that a problem? If it's a nightmare to maintain, poorly documented and unworkable does that matter if it actually works?

To some extent I think ad-hoc code can be the right tool for the job, if that job is not professional software development. There's stuff I did with Matlab (during my maths degree) that I wouldn't dream of doing now, but the code I write now is shared by my team and revisited regularly, the code I wrote then hasn't been looked at since I graduated.

Keith
+1  A: 

Write better code the first time around, if only to make it easier for you to revisit the code yourself months later. Collaborate with like minded scientists and engineers who believe in writing good code. Take pride in your code and remember that one of the reasons we got away from machine code and invented high-level languages was to better communicate with other humans. It doesn't do much good if someone else can't understand your methods, particularly in academia where repeatability of results by other researchers is so important.

Scottie T
+1  A: 

My background is as a physics academic in a non-computationally intensive field.

The mindset in academic programming is that the peer-reviewed paper provides a description of the algorithm which is used to process data. This may be pure modelling, fitting data with some functional form or some other sort of processing. It's assumed that the author has competently implemented the algorithms they've described. A keen reviewer may attempt to repeat some of the analysis (but I imagine this is rare).

Generally I'd expect to see validation of the action of a program in a paper by testing with synthetic data, or comparison against a known (simple) analytical result, if the program was central to the work or novel.

Thinking back to groups I've worked in their would not have been many people (some number close to one) who'd have been able to do a useful code review. In a simulation group, several people would work on the same code and may well be developing new methodology as they go along.

Ian Hopkinson
+1  A: 

Code quality is not usually a huge concern in Academia. Usually, the only consumer of the code is the researcher so things like stability and ease of use are less important than simply getting the job done. Furthermore, research code is necessarily experimental and the specifications change more and more drastically than with other types of code so the final code will have numerous changes grafted onto it.

But I think the most important factor is simply that academics don't regard themselves as professional programmers, programming is just one small part of their job. As a result, there's simply not the level of care or passion that would be able to sustain a code review process.

Shalmanese