I am assigned to a performance-tuning-debugging-troubleshooting task.
Scenario: a multi-application environment running on several networked machines using databases. OS is Unix, DB is Oracle. Business logic is implemented across applications using synchronous/asynchronous communication. Applications are multi-user with several hundred call center users at peak time. User interfaces are web-based.
Applications are third party, I can get access to developers and source code. I only have the production system and a functional test environment, no load test environment.
Problem: bad performance! I need fast results. Management is going crazy.
I got symptom examples like these: user interface actions taking minutes to complete. Seaching for a customer usually takes 6 seconds but an immediate subsequent search with same parameters may take 6 minutes.
What would be your strategy for finding root causes?