




Folks, I have just joined this company which has a huge source tree based upon JSP/Servlet and EJB 1.2. No documentation exists. The code has been written over seven years, with a large number of undocumented changes. Are there any tool tah can assist me in tracing the execution? Putting a breakpoint is not helping me much.

Why are breakpoints not helpful? Stepping in the code with the debugger should work. Whether the code is spaghetti or not shouldn't affect the "debugability" of the system.

On how to deal with this mess I suggest writing tons of unit tests for the existing system. It'll allow you to understand the program better and be in a better situation for refactorings as soon as these are needed (obviously very soon). Have a look at http://amzn.com/0131177052

It could be hard to find units that are testable if it's one large moloch.
Definitely! That's where that book helps. In dealing with old large and ugly codebases.
breakpoints are not helping since most of the method call stacks are very deep. business logic in jsps and javascript add to the difficulty. i was wondering if there could be a stack trace of which method called what (much like looking at log.trace statements if the code were written well), i can just skim through the the sequence to figure out what is the story. Profiler does produce a list of all methods called, but not in any particular sequence.
I also think that book (Working Effectively with Legacy Code) can be really useful.Some profilers produce a call tree (once I used eclipse TPTP that way) but I doubt it can be of big help.
All IDEs can do that. When you reach a breakpoint there is a "Call Stack" panel where you can see the stack trace leading to this specific line of code.
@cherouvim I don't think he's talkig about the call stack.What I was meaning talking about TPTP is to produce a tree of all the calls made during an execution, not just the stack of the calls actually waiting to be completed.
Folks, looks like there arent too many off the shelf solutions. I am going the good old way of writing a shell based "code crawler" and figuring out who is calling what. I will post my script if it works for your reference. Great thanks for all your responses
The good old trick can help here if you're allowed to edit the code: put many System.err.println() at strategic points. It shows the flow of the program, which is probably the first step to discover unknown code.

The trace can also display some variable values or even a stack trace (use new Exception().printStackTrace(System.err)). To avoid a flood of messages, the trace can be guarded by a pre-condition that executes the println only if it worth it.

Be sure to put in each message the current class and method to reference. The message shows clearly the location of the println code, and it will help a lot to remove all the traces when you're done!


Great thanks for everyone's inputs. It was a wonderful learning experience. I ended up writing my shell script, which produces a html report. I am attaching the complete file here.

Please note that I am not a regular shell programmer, and I was working on this after hours .. hence the standard of the code is not too good. It has plenty of cut/past jobs off the internet. It works however, and presents the approach you may take to go through your sphegatti code.

Regards Amarsh


# check the number of command line arguments
echo "### CodeCrawler starting"

# test input parameters
if [[ $# < 2 ]]; then
    echo "usage: % crawl inputFile/inputDir outputDir"
    exit -1

# the working directory is C:\CodeCrawler
cd /cygdrive/c/CodeCrawler

# find all files tha require analysis
if [ -d $1 ]; then
  find $1 | grep "\.java$" > allFiles$2
  find $1 | grep "\.jsp$" >> allFiles$2
  find $1 | grep "\.htm$" >> allFiles$2
  find $1 | grep "\.html$" >> allFiles$2
else if [ -f $1 ]; then
    find $1 > allFiles$2

# get total no. of files to be scanned
totalFiles=$(cat allFiles$2 | wc -l)
echo "### No of files to scan : $totalFiles"

# create the index.html file
rm -rf $2; mkdir $2;cd $2
echo "<html><body bgcolor=\"#ccffff\"><h3>$1</h3>" > dir.html

# crawl through the entire directory 
for rootFile in $( cat ../allFiles$2 ); do

    scannedNoOfFiles=$((scannedNoOfFiles+1));echo;echo "### Scanning $scannedNoOfFiles / $totalFiles"

    # create a dir for the output
    rootFileDir=$(echo $rootFile | tr '/' '\n' | tail -1).dir
    echo "### Storing output in $rootFileDir"
    rm -rf $rootFileDir
    mkdir $rootFileDir
    cd $rootFileDir

    # append to the index.html file
    rootFileDirName=$(echo $rootFile | tr '/' '\n' | tail -1)
    echo "<a href=\"$rootFileDir/index.html\" target=\"fileFrame\">$rootFileDirName</a><br>" >> ./../dir.html

    # obtain all external jsp references
    touch jsp.cwl
    cat $rootFile | grep "\.jsp" | tr "'\"\?<>=,()[] " '\n' | sed 's/\.\.//g' | grep "\.jsp" | grep -v "http" | sort -u > tmp
    for line in $(cat tmp);do
        echo /$line | sed 's/\/\//\//g' >> jsp.cwl

    # obtain all external js references
    touch js.cwl
    cat $rootFile | sed 's/\.jsp//g' | grep "\.js" | tr "'\"\?<>=,()[] " '\n' | sed 's/\.\.//g' | grep "\.js" | grep -v "http" | sort -u > tmp
    for line in $(cat tmp);do
        echo /$line | sed 's/\/\//\//g' >> js.cwl

    # obtain all external css references
    touch css.cwl
    cat $rootFile | grep "\.css" | tr "'\"\?<>=,()[] " '\n' | sed 's/\.\.//g' | grep "\.css" | grep -v "http" | sort -u > tmp
    for line in $(cat tmp);do
        echo /$line | sed 's/\/\//\//g' >> css.cwl

    # obtain all external htm references
    touch htm.cwl
    cat $rootFile | grep "\.htm" | tr "'\"\?<>=,()[] " '\n' | sed 's/\.\.//g' | grep "\.htm" | grep -v "http" | sort -u > tmp
    for line in $(cat tmp);do
        echo /$line | sed 's/\/\//\//g' >> htm.cwl

    # obtain all database references
    touch db.cwl
    cat $rootFile | grep -i "select.*from" | sed 's/from/\nfrom/g' | sed 's/FROM/\nFROM/g' | grep -i "from" | sed 's/from//g'| sed 's/FROM//g' | awk '{print $1}' | tr '[;"]' ' ' | uniq > db.cwl
    cat $rootFile | sed "s/.prepareStatement(\"/\nX_X_X/g" | grep "X_X_X" | sed "s/X_X_X//g" | tr '[ ,\$ ]' '\n' | head -1 | uniq >> db.cwl

    # obtain all references to java classes. we include everything with signature com. and exclude "www" and "flight"
    cat $rootFile | tr '["=%;/<>@\t) ]' '\n' | grep "com\." | grep -v "codepassion\." | grep -v "www" | grep -v "flight" | sort -u > tmp
    echo > tmpDirectReferences
    cat tmp | grep "(" >> tmpDirectReferences    # directReferences are like au.com.mycompany.servlet.MiscServlet.getCckey()
    echo > tmpDirectReferences
    cat tmp | grep -v "(" >> tmpJavaFiles            # javaFiles are like Person aPerson; ... aPerson.getPolicy()

    # read directReferences and produce the class.cwl file by identifying class and method
    echo "#D# Looking for direct references" 
    while read classLine; do
        methodName=$(echo $classLine | tr '\.' '\n' | tail -1 | sed 's/(//g')
        className=$(echo $classLine | sed "s/\.$methodName(//g" | tr '[()]' ' ')
        echo $methodName >> $className.cwl
        echo "### class: $className   method:$methodName" 
        echo $className >> tmpDirectReferencesReformed
    done < tmpDirectReferences

    # read javaFiles every fully qualified class name and grab the class from it. then grab the method from it
    echo "#J# Looking for indirect references" 
    while read classLine; do
        className=$(echo $classLine | tr '\.' '\n' | tail -1)
        echo "#F# find: $classLine"
        # indirect references are in the form className objectName ... and then objectName.methodName
        cat $rootFile | grep "$className .*;" | sed -e "s/$className[ \t]\+\([a-zA-Z0-9_]\+\)[ \t]*[;=].*/\1/g" | sed 's/^[ \t]*//;s/[ \t]*$//' | sort -u > tmp$ClassName
        # read tmp$className and find all properties and method references
        while read methodLine; do
        cat $rootFile | grep "$methodLine\." | tr '[ (]' '\n' | sed "s/$methodLine\./\n$methodLine\./g" | grep "$methodLine\." | sort -u | grep -v "[\"%]" | grep -v ".com." | tr '.' '\n' | grep -v "$methodLine" >> $classLine.cwl  
        done < tmp$ClassName
        # direct references are className.methodName
        cat $rootFile | grep "[ ()\"']$className\." | tr ' (' '\n' | grep "$className" | tr '.' '\n' | grep -v "$className"  >> $classLine.cwl
        cat $rootFile | grep "$className\." | tr ' (' '\n' | grep "$className" | tr '.' '\n' | grep -v "$className"  >> $classLine.cwl
    done < tmpJavaFiles

    # consolidate all information to generate the html files
    echo "### Generating index.html"
    rootFileName=$(echo $rootFile | tr '/' '\n' | tail -1)
    touch index.html
    echo "<html><head><title>$rootFileName</title></head><body bgcolor=\"#ffffcc\">" >> index.html
    echo "<h3>$rootFile</h3>" >> index.html 
    # put all java classes
    echo "<br><h3>Referenced classes</h3>">> index.html
    cat tmpDirectReferencesReformed | uniq >> tmpJavaFiles;cat tmpJavaFiles | uniq > tmpJavaFilesU; mv tmpJavaFilesU tmpJavaFiles
    while read aLine; do
        echo "- <a href=\"$aLine.html\" target=\"methodFrame\">$aLine</a><br>" >> index.html 
    done < tmpJavaFiles
    # put all DBs
    echo "<br><h3>Referenced Tables</h3>">> index.html
    while read aLine; do
        echo "- $aLine<br>" >> index.html
    done < db.cwl
    # put all JSPs
    echo "<br><h3>Referenced JSPs</h3>">> index.html
    while read aLine; do
        echo "- $aLine<br>" >> index.html
    done < jsp.cwl
    # put all JSs
    echo "<br><h3>Referenced JavaScript</h3>">> index.html
    while read aLine; do
        echo "- $aLine<br>" >> index.html
    done < js.cwl
    # put all htms
    echo "<br><h3>Referenced htm</h3>">> index.html
    while read aLine; do
        echo "- $aLine<br>" >> index.html
    done < htm.cwl
    # put all css
    echo "<br><h3>Referenced css</h3>">> index.html
    while read aLine; do
        echo "- $aLine<br>" >> index.html
    done < css.cwl
    echo "</body></html>" >> index.html

    # generate a html for each class file and put all accessed methods in it
    for aLine in $( ls *.cwl ); do
        cat $aLine | uniq > tmp; mv tmp $aLine   
        fileName=$(echo $aLine | sed 's/\.cwl//g')
        echo "#G# generating $fileName.html"
        echo "<html><body bgcolor=\"#ffddee\">" >> $fileName.html
        echo "<h3>$fileName</h3>" >> $fileName.html
        for bLine in $( cat $aLine | sort ); do
          echo "$bLine<br>" >> $fileName.html
        echo "</body></html>" >> $fileName.html

    # cleanup and return
    #rm *.cwl *tmp* 
    cd ..


echo "</body></html>" >> ./dir.html
rm ../allFiles$2
echo "### CodeCrawler finished"
What does this script?
it attempts to produce all Java class / JSP references in a set of files.