I've recently come to maintain a large amount of scientific calculation-intensive FORTRAN code. I'm having difficulties getting a handle on all of the, say, nuances, of a forty year old language, despite google & two introductory level books. The code is rife with "performance enhancing improvements". Does anyone have any guides or practical advice for de-optimizing FORTRAN into CS 101 levels? Does anyone have knowledge of how FORTRAN code optimization operated? Are there any typical FORTRAN 'gotchas' that might not occur to a Java/C++/.NET raised developer taking over a FORTRAN 77/90 codebase?
Well, in one sense, you're lucky, 'cause Fortran doesn't have much in the way of subtle flow-of-control constructs or inheritance or the like. On the other, it's got some truly amazing gotchas, like the arithmetically-calculated branch-to-numeric-label stuff, the implicitly-typed variables which don't require declaration, the lack of true keywords.
I don't know about the "performance enhancing improvements". I'd guess most of them are probably ineffective, as a couple of decades of compiler technology have made most hinting unnecessary. Unfortunately, you'll probably have to leave things the way they are, unless you're planning to do a massive rewrite.
Anyway, the core scientific calculation code should be fairly readable. Any programming language using infix arithmetic would be good preparation for reading Fortran's arithmetic and assignment code.
As someone with experience in both FORTRAN (77 flavor although it has been a while since I used it seriously) and C/C++ the item to watch out for that immediately jumps to mind are arrays. FORTRAN arrays start with an index of 1 instead of 0 as they do in C/C++/Java. Also, memory arrangement is reversed. So incrementing the first index gives you sequential memory locations.
My wife still uses FORTRAN regularly and has some C++ code she needs to work with now that I'm about to start helping her with. As issues come up during her conversion I'll try to point them out. Maybe they will help.
Here's another one that has bit me from time to time. When you are working on FORTRAN code make sure you skip all six initial columns. Every once and a while, I'll only get the code indented five spaces and nothing works. At first glance everything seems okay and then I finally realize that all the lines are starting in column 6 instead of column 7.
For anyone not familiar with FORTRAN, the first 5 columns are for line numbers (=labels), the 6th column is for a continuation character in case you have a line longer than 80 characters (just put something here and the compiler knows that this line is actually part of the one before it) and code always starts in column 7.
You kind of have to get a "feel" for what programmers had to do back in the day. The vast majority of the code I work with is older than I am and ran on machines that were "new" when my parents were in high school.
Common FORTRAN-isms I deal with, that hurt readability are:
- Common blocks
- Implicit variables
- Two or three DO loops with shared CONTINUE statements
- GOTO's in place of DO loops
- Arithmetic IF statements
- Computed GOTO's
- Equivalence REAL/INTEGER/other in some common block
Strategies for solving these involve:
- Get Spag / plusFORT, worth the money, it solves a lot of them automatically and Bug Free(tm)
- Move to Fortran 90 if at all possible, if not move to free-format Fortran 77
- Add IMPLICIT NONE to each subroutine and then fix every compile error, time consuming but ultimately necessary, some programs can do this for you automatically (or you can script it)
- Moving all COMMON blocks to MODULEs, low hanging fruit, worth it
- Convert arithmetic IF statements to IF..ELSEIF..ELSE blocks
- Convert computed GOTOs to SELECT CASE blocks
Convert all DO loops to the newer F90 syntax
myloop: do ii = 1, nloops ! do something enddo myloop
Convert equivalenced common block members to either ALLOCATABLE memory allocated in a module, or to their true character routines if it is Hollerith being stored in a REAL
If you had more specific questions as to how to accomplish some readability tasks, I can give advice. I have a code base of a few hundred thousand lines of Fortran which was written over the span of 40 years that I am in some way responsible for, so I've probably run across any "problems" you may have found.
Could you explain what you have to do in maintaining the code? Do you really have to modify the code? If you can get away by modifying just the interface to that code instead of the code itself, that would be the best.
The inherent problem when dealing with a large scientific code (not just FORTRAN) is that the underlying mathematics and the implementation are both complex. Almost by default, the implementation has to include code optimization, in order to run within reasonable time frame. This is compounded by the fact that a lot of code in this field is created by scientists / engineers that are expert in their field, but not in software development. Let's just say that "easy to understand" is not the first priority to them (I was one of them, still learning to be a better software developer).
Due to the nature of the problem, I don't think a general question and answer is enough to be helpful. I suggest you post a series of specific questions with code snippet attached. Perhaps starting with the one that gives you the most headache?
There's something in the original question that I would caution about. You say the code is rife with "performance enhancing improvements". Since Fortran problems are generally of a scientific and mathematical nature, do not assume these performance tricks are there to improve the compilation. It's probably not about the language. In Fortran, the solution is seldom about efficiency of the code itself but of the underlying mathematics to solve the end problem. The tricks may make the compilation slower, may even make the logic appear messy, but the intent is to make the solution faster. Unless you know exactly what it is doing and why, leave it alone.
Even simple refactoring, like changing dumb looking variable names can be a big pitfall. Historically standard mathematical equations in a given field of science will have used a particular shorthand since the days of Maxwell. So to see an array named B(:) in electromagnetics tells all Emag engineers exactly what is being solved for. Change that at your peril. Moral, get to know the standard nomenclature of the science before renaming too.
Legacy Fortran Soapbox
I helped maintain/improve a legacy Fortran code base for quite a while and for the most part think sixlettervariables is on the money. That advice though, tends to the technical; a tougher row to hoe is in implementing "good practices".
- Establish a required coding style and coding guidelines.
- Require a code review (of more than just the coder!) for anything submitted to the code base. (Version control should be tied to this process.)
- Start building and running unit tests; ditto benchmark or regression tests.
These might sound like obvious things these days, but at the risk of over-generalizing, I claim that most Fortran code shops have an entrenched culture, some started before the term "software engineering" even existed, and that over time what comes to dominate is "Get it done now". (This is not unique to Fortran shops by any means.)
Embracing Gotchas
But what to do with an already existing, grotty old legacy code base? I agree with Joel Spolsky on rewriting, don't. However, in my opinion sixlettervariables does point to the allowable exception: Use software tools to transition to better Fortran constructs. A lot can be caught/corrected by code analyzers (FORCHECK) and code rewriters (plusFORT). If you have to do it by hand, make sure you have a pressing reason. (I wish I had on hand a reference to the number of software bugs that came from fixing software bugs, it is humbling. I think some such statistic is in Expert C Programming.)
Probably the best offense in winning the game of Fortran gotchas is having the best defense: Knowing the language fairly well. To further that end, I recommend ... books!
Fortran Dead Tree Library
I have had only modest success as a "QA nag" over the years, but I have found that education does work, some times inadvertently, and that one of the most influential things is a reference book that someone has on hand. I love and highly recommend
Fortran 90/95 for Scientists and Engineers, by Stephen J. Chapman
The book is even good with Fortran 77 in that it specifically identifies the constructs that shouldn't be used and gives the better alternatives. However, it is actually a textbook and can run out of steam when you really want to know the nitty-gritty of Fortran 95, which is why I recommend
Fortran 90/95 Explained, by Michael Metcalf & John K. Reid
as your go-to reference (sic) for Fortran 95. Be warned that it is not the most lucid writing, but the veil will lift when you really want to get the most out of a new Fortran 95 feature.
For focusing on the issues of going from Fortran 77 to Fortran 90, I enjoyed
Migrating to Fortran 90, by Jim Kerrigan
but the book is now out-of-print. (I just don't understand O'Reilly's use of Safari, why isn't every one of their out-of-print books available?)
Lastly, as to the heir to the wonderful, wonderful classic, Software Tools, I nominate
Classical FORTRAN, by Michael Kupferschmid
This book not only shows what one can do with "only" Fortran 77, but it also talks about some of the more subtle issues that arise (e.g., should or should not one use the EXTERNAL declaration). This book doesn't exactly cover the same space as "Software Tools" but they are two of the three Fortran programming books that I would tag as "fun".... (here's the third).
Miscellaneous Advice that applies to almost every Fortran compiler
- There is a compiler option to enforce IMPLICIT NONE behavior, which you can use to identify problem routines without modifying them with the IMPLICIT NONE declaration first. This piece of advice won't seem meaningful until after the first time a build bombs because of an IMPLICIT NONE command inserted into a legacy routine. (What? Your code review didn't catch this? ;-)
- There is a compiler option for array bounds checking, which can be useful when debugging Fortran 77 code.
- Fortran 90 compilers should be able to compile almost all Fortran 77 code and even older Fortran code. Turn on the reporting options on your Fortran 90 compiler, run your legacy code through it and you will have a decent start on syntax checking. Some commercial Fortran 77 compilers are actually Fortran 90 compilers that are running in Fortran 77 mode, so this might be relatively trivial option twiddling for whatever build scripts you have.
I loved FORTRAN, I used to teach and code in it. Just wanted to throw that in. Haven't touched it in years.
I started out in COBOL, when I moved to FORTRAN I felt I was freed. Everything is relative, yeah?
I'd second what has been said above - recognise that this is a PROCEDURAL language - no subtelties - so take it as you see it.
Probably frustrate you to start with.