views:

536

answers:

5

I'm sure it repeats everywhere. You can 'feel' network is slow, or machine or slow or something. But the server/chassis logs are not showing anything, so IT doesn't believe you. What do you do?

Your regressions are taking twice the time ... but that's not enough Okay you transfer 100 GB using dd etc, but ... that's not enough. Okay you get server placed in different chassis for 2 week, it works fine ... but .. that's not enough...

so HOW do you get IT to replace the chassis ?

More specifically: Is there any suite which I can run on two setups ( supposed to be identical ), which can show up difference in network/cpu/disk access .. which IT will believe ?

+4  A: 

Run a standard benchmark suite. See if it pinpoints memory, cpu, bus or disk, when compared to a "working" similar computer.

See http://en.wikipedia.org/wiki/Benchmark_(computing)#Common_benchmarks for some tips.

Marcus Lindblom
Do you know any freely available network/cpu/disk benchmarking, which is standard enough, so that's IT will believe it's results ?
Vardhan Varma
Not really. Depends on your app, OS, etc. See the wikipedia link I added. If your IT is as boneheaded as it sounds, I don't really know what will help. But showing some figures clearly ought to do it. At least, perhaps you can convince your boss to take further steps?
Marcus Lindblom
+5  A: 

Computers don't age and slow down the same way we do. If your server is getting slower -- actually slower, not just feels slower because every other computer you use is getting faster -- then there is a reason and it is possible that you may be able to fix it. I'd try cleaning up some disk space, de-fragmenting the disk, and checking what other processes are running (perhaps someone's added more apps to the system and you're just not getting as many cycles).

If your app uses a database, you may want to analyze your query performance and see if some indices are in order. Queries that perform well when you have little data can start taking a long time as the amount of data grows if they have to use table scans. As a former "IT" guy, I'd also be reluctant to throw hardware at a problem because someone tells me the system is slowing down. I'd want to know what has changed and see if I could get the system running the way it should be. If the app has simply out grown the hardware -- after you've made suitable optimizations -- then upgrading is a reasonable choice.

tvanfosson
A: 

It sounds like your main problem is being listened to by IT, but raw technical data may not be persuasive to the right people. Getting backup from the business may help you and that means talking about money.

Luckily, both platforms already contain a common piece of software - the application itself - designed to make or save money for someone. Why not measure how quickly it can do that e.g. how long does it take to process an order?

By measuring how long your application spends dealing with each sub task or data source you can get a rough idea of the underlying hardware which is under performing. Writing to a local database, or handling a data structure larger than RAM will impact the disk, making network calls will impact the network hardware, CPU bound calculations will impact there.

This data will never be as precise as a benchmark, and it may require expensive coding, but its easier to translate what it finds into money terms. Log4j's NDC and MDC features, and Springs AOP might be good enabling tools for you.

Simon Gibbs
+2  A: 

The only way to prove something is to do a stringent audit.

Now traditionally, we should keep the system constant between two different sets while altering the variable we are interested. In this case the variable is the hardware that your code is running on. So in simple terms, you should audit the running of your software on two different sets of hardware, one being the hardware you are unhappy about. And see the difference.

Now if you are to do this properly, which I am sure you are, you will first need to come up with a null hypothesis, something like:

"The slowness of the application is unrelated to the specific hardware we are using"

And now you set about disproving that hypothesis in favour of an alternative hypothesis. Once you have collected enough results, you can apply statistical analyses on them, to decide whether any differences are statistically significant. There are analyses to find out how much data you need, and then compare the two sets to decide if the differences are random, or not random (which would disprove your null hypothesis). The type of tests you do will mostly depend on your data, but clever people have made checklists to help us decide.

Ali A
A: 

Run perfmon.msc from Start / Run in Windows 2000 through to Vista. Then just add counters for CPU, disk etc..

For SQL queries you should capture the actual queries then run them manually to see if they are slow.

For instance if using SQL Server, run the profiler from Tools, SQL Server Profiler. Then perform some operations in your program and look at the capture for any suspicous database calls. Copy and paste one of the queries into a new query window in management studio and run it.

For networking you should try artificially limiting your network speed to see how it affects your code (e.g. Traffic Shaper XP is a simple freeware limiter).