tags:

views:

1892

answers:

8

We have an application that generates simulated data for one of our services for testing purposes. Each data item has a unique Guid. However, when we ran a test after some minor code changes to the simulator all of the objects generated by it had the same Guid.

There was a single data object created, then a for loop where the properties of the object were modified, including a new unique Guid, and it was sent to the service via remoting (serializable, not marshal-by-ref, if that's what you're thinking), loop and do it again, etc.

If we put a small Thread.Sleep( ...) inside of the loop, it generated unique id's. I think that is a red-herring though. I created a test app that just created one guid after another and didn't get a single duplicate.

My theory is that the IL was optimized in a way that caused this behavior. But enough about my theories. What do YOU think? I'm open to suggestions and ways to test it.

UPDATE: There seems to be a lot of confusion about my question, so let me clarify. I DON'T think that NewGuid() is broken. Clearly it works. Its FINE! There is a bug somewhere though, that causes NewGuid() to either: 1) be called only once in my loop 2) be called everytime in my loop but assigned only once 3) something else I haven't thought of

This bug can be in my code (MOST likely) or in optimization somewhere.

So to reiterate my question, how should I debug this scenario?

(and thank you for the great discussion, this is really helping me clarify the problem in my mind)

UPDATE # 2: I'd love to post an example that shows the problem, but that's part of my problem. I can't duplicate it outside of the whole suite of applications (client and servers).

Here's a relevant snippet though:

OrderTicket ticket = new OrderTicket(... );

for( int i = 0; i < _numOrders; i++ )
{
 ticket.CacheId = Guid.NewGuid();
 Submit( ticket );  // note that this simply makes a remoting call
}
+2  A: 

It's a bug in your code. If you've managed to generate multiple guid's it is the most likely explanation. The clue is here in your question: "when we ran a test after some minor code changes to the simulator all of the objects generated by it had the same Guid"

Mitch Wheat
Yup, that's what I thought too. But I couldn't find it. Each iteration was calling NewGuid() each time and each time it was returning the same Id. Suggestions?
dviljoen
That's the problem with random number generators, you can never be sure.
tsilb
I don't buy that. The guid has some randomness in it, and yes, theoretically duplicates are "possible" but not EVERY ONE. That is clearly not the issue here. There was something wrong either with our code or the optimizer. My problem is I don't know how to find out which. Suggestions?
dviljoen
How about cutting the code down to the minimum that duplicates the issue and posting it here.
Mitch Wheat
A: 

See this article about how a Guid is created.

This artcile came from This answer.

Bottom line if you are creating the GUIDs too quickly and the clock hasn't moved forward that is why you are getting some as the same. However when you put a sleep in it works because the clock has moved.

David Basarab
Thanks, but again, this doesn't help me. I know that a Guid is statistically unique. My problem is NOT that I had 2 out of 10,000 Guids that were dupes. My problem was that 10,000 out of 10,000 were dupes. That's not a statistical collision. That's a bug. But where? And how to find?
dviljoen
Sorry, had to vote you down for not fully reading the article you linked to.
MusiGenesis
+2  A: 

The code in Submit and OrderTicket would be helpful as well...

You're reusing OrderTicket. I'd suspect that either you (or remoting itself) is batching calls out - probably in respect to # of connections/host limits - and picking up the last value of CacheId when it finally sends them along.

If you debug or Thread.Sleep the app, you're changing the timing so that the remoting call finishes before you assign a new CacheId.

Are you asyncing the remoting call? I'd think a sync call would block - but I'd check with a packet sniffer like Wireshark to be sure. Regardless, just changing to creating a new OrderTicket in each iteration would probably do the trick.

Edit: The question is not about NewGuid being broken...so my previous answer has been removed.

Mark Brackett
I know its a bug. But like several others here you seem to be fixating on the uniqueness of a Guid. That's not the issue. Again, 2 dupes out of 10,000 would be a uniqueness problem. I'm getting 10,000 of the exact same Guid.
dviljoen
If you know it's a bug (meaning you think it's an app bug), then why the title "Duplicate returned by Guid.NewGuid()" and the theory that it's the JIT optimizer causing an issue?
Mark Brackett
As for As for fixating on the "uniqueness of a Guid", that's not even close to what I was positing. An out of entropy or uncaught failed malloc (for instance) could cause all sorts of issues - including repeated Guids (potentially - again it's far fetched theory for your select is broken mindset).
Mark Brackett
Please read my UPDATE above. I'm not saying NewGuid is broken. I never intended to even insinuate that (in retrospect, I'm just reading my posting title and yes, it does sound that way ... sorry). And I don't have a "select is broken" mindset. I have a bug that I don't know how to find.
dviljoen
+3  A: 

Thousands of developers use Guids in .NET. If Guid.NewGuid() had any tendency at all to get "stuck" on one value, the problem would have been encountered long ago.

The minor code changes are the sure culprit here. The fact that Thread.Sleep (which is less a red herring than a fish rotting in the sun) "fixes" your problem suggests that your properties are being set in some weird way that can't take effect until the loop stops blocking (either by ending or by Thread.Sleep). I'd even be willing to bet that the "minor change" was to reset all the properties from a separate thread.

If you posted some sample code, that would help.

MusiGenesis
You'd lose that bet. This is a single thread that simply generates object data over the wire.
dviljoen
C'mon, post some code. Guid.NewGuid() isn't the problem.
MusiGenesis
Okay, I voted you up because, yes, I'd be paying up. ;-)
dviljoen
Good man. I'm going to go randomly upvote a bunch of your answers. :) Sorry about the "fish rotting in the sun" remark.
MusiGenesis
Er ... assuming you are a man. Progammer's habit.
MusiGenesis
+1  A: 

I dont know the details of how GUIDs are generated.. yet. However currently my org. is breeding GUIDs at a rate that would put rabbits to shame. So I can vouch for the fact that GUIDs aren't broken.. yet.

  • Post the source code if possible.. or a clone repro app. Many times I find the act of creating that clone app to repro the problem shows me the issue.
  • The other approach would be to comment out "those minor changes". If that fixes the problem, you can then triangularize to find the offending line of code. Eye-ball the minor changes hard... I mean real Hard.

Do let us know how it goes... this sounds interesting.

Gishu
I'd be in deep trouble professionally if Guid.NewGuid() spit out a dupe every few million calls, let alone every single time.
MusiGenesis
A: 

Does anyone think this could have anything to do with the fact that Guid is a struct?

dviljoen
+7  A: 

Does Submit do an async call, or does the ticket object go into another thread at any stage.

In the code example you are reusing the same object. What if Submit sends the ticket in a background thread after a short delay (and does not take a copy). When you change the CacheId you are actually updating all the pending submits. This also explains why a Thread.Sleep fixes the problem. Try this:

for( int i = 0; i < _numOrders; i++ )
{
    OrderTicket ticket = new OrderTicket(... );
    ticket.CacheId = Guid.NewGuid();
    Submit( ticket );  // note that this simply makes a remoting call
}

If for some reason this is not possible, try this and see if they are still the same:

ticket.CacheId = new Guid("00000000-0000-0000-0000-" + 
     string.Format("{0:000000000000}", i));
Robert Wagner
You got it! That's precisely the problem. Awesome!!! Thanks.
dviljoen
@dviljoen: looks like I'd win that bet after all. :)
MusiGenesis
A: 

My gut is telling me something along these lines is going on...

class OrderTicket 
{
   Guid CacheId {set {_guid = new Guid("00000000-0000-0000-0000-");}
}

Log the value of CacheId into a log file every time its called with a stack trace ... Maybe someone else is setting it.

Sam Saffron
Nope. It simply wraps _cacheId which is initialized to Guid.NewGuid()
dviljoen
What happens when you log the value + the system.diagnostics stack trac ...
Sam Saffron