views:

36

answers:

2

I am going to create a typical business application that will be used by a few hundred consultants. Normally, the consultants would be presented with an error message with a standard text. As the application will be a complicated one with lots of changes being made to it constantly I would like the following:

When an error message is presented, the user has the option to "send" the error message to the developers. The developers should be able to open the incoming file in i.e. Eclipse and debug the steps of the last 10 minutes of work step by step (one line at a time if they want to). Everything should be transparent, meaning that they for example should be able to see the return values of calls to the database.

Are there any solutions that offer such functionality today, my preferred language is Python or also Java. I know that there will be a huge performance hit because of such functionality, but that is acceptable as this kind of software is not performance sensitive.

It would be VERY nice if the database also had a cronology so that one could query the database for values that existed at the exact time that a specific line of code was run in the application, leading up to the bug.

A: 

since you have a database, stack user commands and state information for debugging later. you could go back more than 10 minutes too!

David
This is not an answer, but a good comment!
David
+1  A: 

You should try to use logging, e.g. commit logs from the DB and logging the user interactions with the application, if it is a web application you can start with the log files from the webserver. Make sure that the logfiles include all submitted data such as the complete GET url with parameters and POST with entity body. You can configure the web server to generate such logs when necesary.

Then you build a test client that can parse the log files and re-create all the user interaction that caused the problem to appear. If you suspect race conditions you should log with high precision (ms resolution) and make sure that the test client can run through the same sequences over and over again to stress those critical parts.

Replay (as your title suggests) is the best way to reproduce an error, just collect all the data needed to recreate the input that generated a specific state/situation. Do not focus on internal structures and return values, when it comes to hunting down an error or a bug you should not work in forensic mode, e.g. trying to analyse the cause of the crash by analyzing the wreck, you should crash the plane over and over again and add more and more logging/or use a debugger until you know that goes wrong.

Ernelli