views:

325

answers:

6

Assume you are taking over a legacy .NET app. [pre - 3.0] written in C#

What are the top 5 diagnostic measures, profiling or otherwise that you would employ to assess the health of the application?

I am not just looking at the "WHAT" part of diagnosis but also at the "HOW". For e.g. It is indeed necessary to assess fast/optimum response times of the app. ... but is there a way to to establish/measure it by technical diagnosis of the code base instead of just getting user-experience feedback?

alt text

And yes there are bound to be some awesome tools that you use for the purpose...it would be great if you list them too.

A: 

These aren't coding tips or profiling advice, but a general way of assessing the health of a program in any language. In order of importance

  1. Is the end user happy with it?
  2. Is it stable?
  3. Is it robust?
  4. Is it fast?
  5. Is the memory footprint stable over long periods and what I would expect?

If the answer to all 5 of those questions is yes, then you have a healthy application. I would argue that 1-3 are really the most important. It may not be pretty on the inside, and possibly down right butt ugly, but its healthy if it meets those specifications and should forever remain in legacy mode (i.e. small bugfixes)

Steve
Steve I agree with your objectives...but I am looking at not just the "What" of the metrics but also the "How"...editing the question to clarify..thanks for your response!
Shankar Ramachandran
+1  A: 

I would suggest writing tests around certain areas. I'm not a massive fan of unit tests - although I end up writing quite a few of them. I prefer system tests that test parts of the system - so from domain down, service down, presenter down etc not neccesarily the whole system but parts of it. If you're looking for efficiency then these tests can run a StopWatch around the code and fail if it takes too long.

Another nice thing to do is run standard tasks through ANTs Profiler from RedGate or dotTrace from Jetbrains. It'll tell you what's taking the time and how many times it's been run, meaning you can see where can be optimised/cached.

If you're using NHibernate then NHProf is great (or I think Ayende has now released the UberProf which covers more DB access strategies.) This will warn you of any stupid DB access going on. Failing this just using the SQL Server profiler might show you requesting the same data again and again but will require more effort filtering out the rubbish. If you do end up using that then you can save it to a DB table which you can then query in a more intelligent way.

If you're looking for robustness, a good thing to use is a logging strategy - catch all exceptions and log them. This is easy enough to set up using log4net. Also log if you hit certain points that you're slightly suspicious of. Then have this running into a server (I use kiwi syslog server which is easy to set up and quite powerful) which can write to a DB and you can run analysis on the results. I would recommend against the ADO.NET appender for log4net as it is not async and so will slow your app down.

Finally depending on what the app is if you're really really keen on spending some time on testing its health you can use WaTIN or the Winforms equivalent to test the front end. This could even be a prolonged test watching the memory/processor usage of the application while it's being used. If you're not that worried then the windows performance analyser will allow you to look at various aspects of the application while you use it. Always useful but you have to really poke around to get the useful metrics.

Hope this helps.

Stu
A: 

The first two big items I would look into would be:

  1. Adding global exception handling with logging, as well as searching for any exception handling that might be "swallowing" exceptions, hiding problems with your application (I think there is also a windows performance counter that will expose the number of exceptions per second that are being thrown by your application). This can help to uncover any potential data consistency issues in your application.
  2. Add some performance monitoring and logging to any data persistence and/or external network service dependencies which the application might be using, such as logging database queries or web service calls that take longer than X amount of time to complete.
Dave Falkner
+20  A: 

1. User Perception

The very first thing I'd do is simply survey the users. Remember, they are the ones we are doing this for. However horrible an application may look inside, if the users love it (or at least don't actively dislike it) then you don't want to immediately start ripping it apart.

I'd want to ask questions such as:

  • Does it run smoothly?
  • Is it easy to use?
  • When you use it, do you feel confident that it's doing what you expect?
  • Is it a BMW, a Civic, or a Pinto?

The answers will be subjective. That's okay. At this point we're just looking for broad trends. If an overwhelming number of users say that it crashes all the time, or that they're afraid to perform basic tasks, then you're in trouble.

If the app breeds superstition, and you hear things like "it seems to flake out on Thursday mornings" or "I don't know what this button does, but it doesn't work unless I click it first", run for the hills.

2. Documentation

A lack of documentation, or documentation that is hideously out of date, is a sure sign of a sick application. No documentation means that development staff cut corners, or are so overworked with the constant death march that they just can't find the time for this kind of "unnecessary" work.

I'm not talking user manuals - a well-designed app shouldn't need them - I mean technical documentation, how the architecture looks, what the components do, environmental dependencies, configuration settings, requirements/user stories, test cases/test plans, file formats, you get the idea. A defect tracking system is also an essential part of documentation.

Developers end up making (incorrect) assumptions in the absence of proper documentation. I've spoken to several people in the industry who think that this is optional, but every system I have ever seen or worked on that had little or no documentation ended up being riddled with bugs and design flaws.

3. Tests

No better way to judge the health of an application than by its own tests, if they're available. Unit tests, code coverage, integration tests, even manual tests, anything works here. The more complete the suite of tests, the better the chance of the system being healthy.

Successful tests don't guarantee much at all, other than that the specific features being tested work the way that the people who wrote the tests expect them to. But a lot of failing tests, or tests that haven't been updated in years, or no tests at all - those are red flags.

I can't point to specific tools here because every team uses different tools for testing. Work with whatever is already in production.

4. Static Analysis

Some of you probably immediately thought "FxCop." Not yet. The first thing I'd do is break out NDepend.

Just a quick look at the dependency tree of an application will give you enormous amounts of information about how well the application is designed. Most of the worst design anti-patterns - the Big Ball of Mud, Circular Dependencies, Spaghetti Code, God Objects - will be visible almost immediately from just a bird's-eye view of the dependencies.

Next, I would run a full build, turning on the "treat warnings as errors" setting. Ignoring specific warnings through compiler directives or flags is alright most of the time, but literally ignoring the warnings spells trouble. Again, this won't guarantee you that everything is OK or that anything is broken, but it's a very useful heuristic in determining the level of care that went into the actual coding phase.

After I am satisfied that the overall design/architecture is not complete garbage, then I would look at FxCop. I don't take its output as gospel, but I am specifically interested in Design Warnings and Usage Warnings (security warnings are also a red flag but very rare).

5. Runtime Analysis

At this point I am already satisfied that the application, at a high level, is not an enormous mound of suck. This phase would vary quite a bit with respect to the specific application under the microscope, but some good things to do are:

  • Log all first-chance exceptions under a normal run. This will help to gauge the robustness of the application, to see if too many exceptions are being swallowed or if exceptions are being used as flow control. If you see a lot of top-level Exception instances or SystemException derivatives appearing, be afraid.

  • Run it through a profiler such as EQATEC. That should help you fairly easily identify any serious performance problems. If the application uses a SQL back-end, use a SQL profiling tool to watch queries. (Really there are separate of steps for testing the health of a database, which is a critical part of testing an application that's based on one, but I don't want to get too off-topic).

  • Watch a few users - look especially for "rituals", things they do for apparently no reason. These are usually the sign of lingering bugs and ticking time bombs. Also look to see if it generates a lot of error messages, locks up the UI for long periods while "thinking", and so on. Basically, anything you'd personally hate to see as a user.

  • Stress tests. Again, the specific tools depend on the application, but this is especially applicable to server-based apps. See if the application can still function under heavy load. If it starts timing out near the breaking point, that's OK; if it starts generating bizarre error message or worse, seems to corrupt data or state, that's a very bad sign.


And that's about all I can think of for now. I'll update if any more come to mind.

Aaronaught
A: 

You open the "readme.txt" file. It contains the following text:

'Welcome to acmesys readme.txt. You are now in charge of this application. There are two files located on the network, step1.txt and step2.txt. Everytime you have a problem, open one of these files and follow the instructions.

After a few weeks your boss starts raging at you because the code is not working and the users are complaining. You open step1.txt, it says the following.

'Blame everything on me.'

You get more time to fix the system. After another few months you still havent succeeded in solving the problems. You boss schedules a meeting with you and HR. In desperation, you open the step2.txt file - it says:

'Create two files located on the network.'
James Westgate
A: 

If this interacts with a database, you should get a feel for Disk I/O and the degree of fragmentation of the disk array / hard drive. For MS SQL, analyze any stored procedures and review the indexes and primary keys on the tables.

You really do not needs tools for this, just the grunt work of reviewing counters and talking with the DBA .

David Robbins