views:

195

answers:

6

I'm working on a data mining research project and use code from a big svn.

Apparently one of the methods I use from that svn uses randomness somewhere without asking for a seed, which makes 2 calls to my program return different results. That's annoying for what I want to do, so I'm trying to locate that "uncontrolled" randomness.

Since the classes I use depend on many other, that's pretty painful to do by hand. Any idea how I could find where that randomness comes from ?

Edit:

Roughly, my code is structured as :

- stuff i wrote
- call to a method I didnt write involving lots of others classes
- stuff i wrote

I know that the randomness is introduced in the method I didn't write, but can't locate where exactly...

Idea:

What I'm looking for might be a tool or Eclipse plug-in that would let me see each time Random is instantiated during the execution of my program. Know anything like that ?

+1  A: 

Maybe it's a bit old-fashioned style, but...

How about tracing the intermediate results (variables, functions arguments) to standard output, gathering inputs for two different runs and checking where do they start to differ?

Grzegorz Oledzki
+2  A: 

The default seed of many random number generators is the current time. If it's a cryptographic random number generator, it's a seed that's far more complex than that.

I'd bet that your random numbers are probably being seeded with the current time. The only way to fix that is to find the code that creates or seeds the random number generator and change it to seed to a constant. I'm not sure what the syntax of that is in Java, but in my world (C#) it's something like:

Random r = new Random(seedValue);

So even with an answer from StackOverflow, you still have some detective work to do to find the code you want.

Dave Markle
Our worlds aren't very different, as in Java it would be `Random r = new Random(seedValue);`
Pepijn
I know, but that's the detective work you're talking about that I'm trying to avoid..
Jules Olléon
@Jules: do a text search on the code for the word "random", and change the seed value and see what that does. I can't save you from having to do any work at all.
Dave Markle
@Dave Markle: the problem is that I don't know exactly which files are used during the execution (during the call on the method I didn't write), and the svn contains hundreds of files so I can't search through all of them. I need a way to somehow track what's used during the execution.
Jules Olléon
hundreds of files? Grep can handle that, no problem, in seconds. "grep -ril "new Random(" will probably get you where you need to go. Download it from www.cygwin.com.
Dave Markle
Definitely no need for cygwin as `findstr` will do the same for you out of the box.
Joey
A: 

Which "big svn" are you using?

You could write some simple tests, to test whether or not two identical calls to underlying functions return two identical results...

Unless you know where the Random object is created, you're going to have to do some detective work this way.

How much of this code is open to you?

Pepijn
That's the svn of a research lab (in a university). All the code is open to me.
Jules Olléon
If the code is open to you, couldn't you just do some greps for variations on "Random"?
jeffa00
Because I don't want to follow the flow of execution through 40 imbricated method calls just to determine on which files to grep ?
Jules Olléon
Execute your function, follow the stack trace, and grep all the classes listed. That would probably be your best call.
Pepijn
+1  A: 

Maybe you want to read this:

In Java, when you create a new Random object, the seed is automaticly set to the system clocks "current time" in nanoseconds. So, when you check out the source of the Random class you will see a constructor, something like this:

public Random()
{
    this(System.nanoTime());
}

Or maybe this:

In Eclipse you can set your cursor in a variable and then press F3 or F2 (I don't know exactly). This will bring you to the point where this variable is declared.

A second tool you can use is "Find usages". Then your IDE will search to all usages of a method, a variable or variable or whatever you want.

Martijn Courteaux
Would "Find usages" let me see for instance each call to Random() during the execution of my program ? How can I use it in Eclipse ?
Jules Olléon
You can use the "References" and "Declarations" menu's from Eclipse to find where certain types are, obviously, referenced and declared. Look under the "Search" menu.
Pepijn
Ok, did that, no class from the svn instanciated Random without a seed... but that's ok I got the answer, the problem was actually not in the svn (see my answer). Thanks anyway!
Jules Olléon
A: 

Why don't you insert a lot of logging calls (e.g. to standard error) that trace the state of the value you are concerned about throughout the program.

You can compare the trace across two successive runs to narrow down where the randomness is happening by searching for the first difference in the two log files.

Then you can insert more logging calls in that area until you precisely identify the problem.

mikera
A: 

Java's "Set" class implementations do not guarantee that they iterate the elements the same order. Thus, even if you run a program on the same machine twice, the order in which a set is traversed may change. Can't do anything about it unless one changes all "set" uses into "lists".

Jules Olléon