views:

661

answers:

10

I've inherited a 10K-line program written in 8051 assembly language that requires some changes. Unfortunately it's written in the finest traditions of spaghetti code. The program--written as a single file--is a maze of CALL and LJMP statements (about 1200 total), with subroutines having multiple entry and/or exit points, if they can be identified as subroutines at all. All variables are global. There are comments; some are correct. There are no existing tests, and no budget for refactoring.

A little background on the application: The code controls a communications hub in a vending application that is currently deployed internationally. It handles two serial streams simultaneously (with the help of a separate communications processor) and can be talking to up to four different physical devices, each from a different vendor. The manufacturer of one of the devices recently made a change ("Yeah, we made a change, but the software's absolutely the same!") which causes some system configurations to no longer work, and is not interested in unchanging it (whatever it was they didn't change).

The program was originally written by another company, transferred to my client, then modified nine years ago by another consultant. Neither the original company, nor the consultant, are available as resources.

Based on analysis of the traffic on one of the serial buses, I've come up with a hack, which appears to work, but it's ugly and doesn't address the root cause. If I had a better understanding of the program, I believe I could address the actual problem. I have about one more week before the code's frozen to support an end-of-the month ship date.

Original question: I need to understand the program well enough to make the changes without breakage. Has anyone developed techniques for working with this sort of mess?

I see some great suggestions here, but am limited by time. However I may have another opportunity in the future to pursue some of the more involved courses of action.

+1  A: 

You don't need a special budget for refactoring and testing -- they save you money and let you work faster -- get to it. It's the technique you should use to add changes to legacy, inherited code because it's the cheapest way to do it without "without breakage".

Most of the time, I think there's a trade-off where you get more quality in exchange for spending more time, but with legacy code that you are unfamiliar with, I think it's faster to make tests -- you have to run the code before you ship it, right?

Lou Franco
+4  A: 

Find another job- seriously! Failing that the book "working effectively with legacy code" might help- though I think it is referring to legacy code as code without unit tests.

RichardOD
Great book--I own a copy.
bitFlipper
Yeah I have it on my bookshelf, I haven't unfortunately had time to read it yet though. I love the whole Martin Fowler and Robert Martin series of books though!
RichardOD
+1  A: 

This is one of the few times i'm going to recommend you put your soft skills to work, and present your PM/Manager/CXO with your reasoning behind a re-write, and the time/cost savings involved with such an undertaking

Jason Watts
+6  A: 

I'm afraid there is no magic bullet to this kind of problem. I find the only solution is to print out the ASM file then to go somewhere quiet and to simulate running the program line by line in your mind (while writing the contents of the registers and memory locations on a notepad). After a while you find this doesn't take as long as you would expect. Be prepared to spend many hours doing this and drink gallons of coffee. After a while you will have a understanding of what it is doing and you can consider changes.

Does the 8051 have any unused IO ports ? If it does and you can't work out when certain routines are being called then add code to send these spare ports high or low. Then when the program is running watch these ports with an oscilloscope.

Good luck

IanW
This is a case where greenbar printouts shine.
Maggie
Short term, unfortunately, this is probably what I'll end up doing. Fortunately, gallons of coffee is not a problem! :)
bitFlipper
+15  A: 

First, I would try to get in touch with those people who originally developed the code or who at least maintained it before me, hopefully getting enough information to get a basic understanding of the code in general, so that you can start adding useful comments to it.

Maybe you can even get someone to describe the most important APIs (including their signature, return values and purpose) for the code. If global state is modified by a function, this should also be made explicit. Similarly, start to differentiate between functions and procedures, as well as input/output registers.

You should make it very clear to your employer that this information is required, if they don't believe you, have them actually sit down with you in front of this code while you describe what you are supposed to do and how you have to do it (reverse engineering). Having an employer with a background in computing and programming will actually be helpful in that case!

If your employer doesn't have such a technical background, ask him to bring another programmer/colleague to explain your steps to him, doing so will actually show him that you are serious and honest about it, because it's a real issue - not just from your point of view (make sure to have colleagues who know about this 'project').

If available and feasible, I would also make it very clear, that contracting (or at the very least contacting) former developers/maintainers (if they are no longer working for your company, that is) to help document this code would be a pre-requisite to realistically improve the code within a short time span and to ensure that it can be more easily maintained in the future.

Emphasize that this whole situation is due to shortcomings in the previous software development process and that these steps will help improve the code base. So, the code base in its current form is a growing problem and whatever is done now to handle this problem is an investment for the future.

This in itself is also important to help them assess and understand your situation: To do what you are supposed to do now is far from trivial, and they should know about it - if only to set their expectations straight (e.g. regarding deadlines and complexity of the task).

Also, personally I would start adding unit tests for those parts that I understand well enough, so that I can slowly start refactoring/rewriting some code.

In other words, good documentation and source code comments are one thing, but having a comprehensive test suite is another important thing, noone can be realistically expected to modify an unfamiliar code base without any established way of testing key functionality.

Given that the code is 10K, I would also look into factoring out subroutines into separate files to make components more identifiable, preferably using access wrappers instead of global variables and also intuitive file names.

Besides, I would look into steps to further improve the readability of the source code by decreasing the complexity, having sub routines with multiple entry points (and possibly even different parameter signatures?) looks like a sure way to obfuscate the code unnecessarily.

Similarly, huge sub routines could also be refactored into smaller ones to help improve readability.

So, one of the very first things, I'd look into doing would be to determine those things that make it really complicated to grok the code base and then rework those parts, for example by splitting huge sub routines with multiple entry points into distinct sub routines that call each other instead. If this cannot be done due to performance reasons or call overhead, use macros instead.

In addition, if it is a viable option, I would consider incrementally rewriting portions of the code using a more high level language, either by using a subset of C, or at least by making fairly excessive use of assembly macros to help standardize the code base, but also to help localize potential bugs.

If an incremental rewrite in C is a feasible option, one possible way to get started would be to turn all obvious functions into C functions whose bodies are -in the beginning- copied/pasted from the assembly file, so that you end up with C functions with lots of inline assembly.

Personally, I would also try running the code in a simulator/emulator to easily step through the code and hopefully start understanding the most important building blocks (while examining register and stack usage), a good 8051 simulator with a built-in debugger should be made available to you if you really have to do this largely on your own.

This would also help you come up with the initialization sequence and main loop structure as well as a callgraph.

Maybe, you can even find a good open source 80851 simulator that can be easily modified to also provide a full callgraph automatically, just doing a quick search, I found gsim51, but there are obviously several other options, various proprietary ones as well.

If I were in your situation, I would even consider outsourcing the effort of modifying my tools to simplify working with this source code, i.e. many sourceforge projects accept donations and maybe you can talk your employer into sponsoring such a modification.

If not financially, maybe by you providing corresponding patches to it?

If you are already using a proprietary product, you might even be able to talk with the manufacturer of this software and detail your requirements and ask them if they are willing to improve this product that way or if they can at least expose an interface to allow customers to make such customizations (some form of internal API or maybe even simple glue scripts).

If they are not responsive, indicate that your employer has been thinking of using a different product for some time now and that you were the only one insisting on that particular product to be used ... ;-)

If the software expects certain I/O hardware and peripherals, you may even want to look into writing a corresponding hardware simulation loop to run the software in an emulator.

Ultimately, I know for a fact that I would personally much more enjoy the process of customizing other software to help me understand such a spaghetti code monster, than manually stepping through the code and playing emulator myself, no matter how many gallons of coffee I can get.

Getting a usable callgraph out of an open source 8051 emulator should not take much longer than say a weekend (at most), because it mostly means to look for CALL opcodes and record their addresses (position and target), so that everything's dumped to a file for later inspection.

Having access to an emulator's internals would actually also be great a way to further inspect the code, for example in order to find recurring patterns of opcodes (say 20-50+), that may be factored into standalone functions/procedures, this might actually help decrease the size and complexity of the code base even further.

The next step would probably be to examine stack and register usage. And to determine the type/size of function parameters used, as well as their value range - so that you can conceive corresponding unit tests.

Using tools like dot/graphviz to visualize the structure of the initialization sequence and the main loop itself, will be a pure joy compared to doing all this stuff manually.

Also, you'll actually end up with useful data and documents that can serve as the foundation for better documentation in the long run.

none
Now that's an answer. well done.
Liran Orevi
Dan
thanks for the positive feedback, actually my original response was just a comment in the form of "my condolences" (see above), I just felt that wouldn't be fair and bothered to write a reply, which was shorter than the current response, which somehow ended up being revised a number of time. Ultimately, it was revised so often that it *accidentally* became a community wiki now. So, that's what you get for being elaborate... Regarding the raise, gotta talk to myself then (self-employed).
none
+3  A: 

How well do you understand the hardware platform this code is running on?

  • Is it been put into power down mode (Pcon=2) to save power If so how is it been woken up. (a reset or on hardware interrupt)

  • Do you have to wait a for the oscillator to stables after a power up before doing serial communications

  • Is it been put into sleep mode (Pcon=1)

Are there different versions of the hardware out in the field?

Make sure you have all the different hardware variations to test on.

Do not waste your time with a simulator – it is very hard to work with and you have to make a lot of assumptions about the hardware. Get yourself an In Circuit Emulator (ICE) and run on the hardware.

The software was written in assembler for a reason you need to find out why. i.e. - memory constraints - speed constraints

There may be a reason that this code is a mess

Have a look at the link file for:

XDATA SPACE, IDATA SPACE and CODE SPACE:

If there is no free code space or Xdata or Idata?

The original author may have Optimizationed it to fit into the memory space available.

If that is the case you need to talk to the original developer to find out what he did.

Charles Faiga
Thanks for the suggestions. No sleep/power-down (it's part of a larger system whose consumption is measured in KWh). Single version of the hardware. No budget for ICE. Plenty of space: ROM utilization at 60%; RAM's almost full at 94%. I don't see any sign of optimization; just badly written code. Some code appears to be unreachable (i.e., "dead"); some RAM is assigned to variables but not used.
bitFlipper
A: 

I'd say IanW's answer (just print it out and keep tracing) is probably the best. That said, I have a slightly off the wall idea:

Try running the code (probably the binary) through a dissembler that can reconstruct C code (if you can find one for the 8051). Maybe it will identify a few routines you can't (easily).

Maybe it'll help.

MBCook
+5  A: 

I know this sounds crazy....but I am unemployed (I picked wrong time to tell the marjority partner to go to hell) and have some free time. I'd be willing to take a look at it. I used to write assembly for the apple ][ and the original PC. If I could play around with your code on the simulator for a couple hours I could give you an idea if I have a chance of documenting it for you (without runing my unplanned vacation). Since I know nothing about 8051 this might not be possible for someone like me, but the simulator looked promising. I wouldn't want any money to do this. Its enough just to get exposure to 8051 embedded development. I told you this would sound crazy.

johnnycrash
Not crazy, and thanks for the kind offer. However for proprietary/IP/security reasons I'm not allowed to share the client's code.
bitFlipper
I figured as much. :)
johnnycrash
+1  A: 

Cut it into pieces.

Jreeter
+3  A: 

I have done this sort of thing a couple of times. Some recommendations:

  • Start by reviewing the schematic, this should help you understand what ports and pins your desired changes impact.
  • Use grep to find all calls, branches, jumps and returns. This can help understand the flow and identify the chunks of code.
  • Look at the reset vector and interrupt table to identify the main lines.
  • Use grep to create a cross reference for all code labels and data references (if your assembler tools cannot do this for you).

Keep in mind Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law.

Good luck.

Thanks. I love grep and use it every day. :)
bitFlipper