views:

349

answers:

2

Hi

I am working my way through Ferret (Ruby port of Lucene) code to solve a bug. Ferret code is mainly a C extension to Ruby. I am running into some issues with the garbage collector. I managed to fix it, but I don't completely understand my fix =) I am hoping someone with deeper knowledge of Ruby and C extension (this is my 3rd day with Ruby) can elaborate. Thanks.

Here is the situation:

Some where in Ferret C code, I am returning a "Token" to Ruby land. The code looks like

static VALUE get_token (...)
{
  ...
  RToken *token = ALLOC(RToken);
  token->text = rb_str_new2("some text");
  return Data_Wrap_Struct(..., &frt_token_mark, &frt_token_free, token);
}

frt_token_mark calls rb_gc_mark(token->text) and frt_token_free just frees the token with free(token)

In Ruby, this code correlates to the following:

token = @input.next

Basically, @input is set to some object, calling the next method on it triggers the get_token C call, which returns a token object.

In Ruby land, I then do something like w = token.text.scan('\w+')

When I run this code inside a while 1 loop (to isolate my problem), at some point (roughly when my ruby process mem footprint goes to 256MB, probably some GC threshold), Ruby dies with errors like

scan method called on terminated object

Or just core dumps. My guess was that token.text was garbage collected.

I don't know enough about Ruby C extension to know what happens with Data_Wrap_Struct returned objects. Seems to me the assignment in Ruby land, token =, should create a reference to it.

My "work-around"/"fix" is to create a Ruby instance variable in the object referred to by @input, and stores the token text in there, to get an extra reference to it. So the C code looks like

RToken *token = ALLOC(RToken);
token->text = rb_str_new2(tk->text);
/* added code: prevent garbage collection */
rb_ivar_set(input, id_curtoken, token->text);
return Data_Wrap_Struct(cToken, &frt_token_mark, &frt_token_free, token);

So now I've created a "curtoken" in the input instance variable, and saved a copy of the text there... I've taken care to remove/delete this reference in the free callback of the class for @input.

With this code, it works in that I no longer get the terminated object error.

The fix seems to make sense to me -- it keeps an extra ref in curtoken to the token.text string so an instance of token.text won't be removed until the next time @input.next is called (at which time a different token.text replaces the old value in curtoken).

My question is: why did it not work before? Shouldn't Data_Wrap_Structure return an object that, when assigned in Ruby land, has a valid reference and not be removed by Ruby?

Thanks.

A: 

When the Ruby garbage collector is invoked, it has a mark phase and a sweep phase. The mark phase marks all objects in the system by marking:

  1. all objects referenced by a ruby stack frame (e.g. local variables)
  2. all globally accessible objects (e.g. referred to by a constant or global variable) and their children/referents, and
  3. all objects referred to by a reference on the stack, as well as those objects' children/referents.

as well as a number of other objects that are not important to this discussion. The sweep phase then destroys any objects that are not accessible (i.e. those that were not marked).

Data_Wrap_Struct returns a reference to an object. As long as that reference is available to ruby code (e.g. stored in a local variable) or is on the stack (referred to by a local C variable), the object should not be swept.

It's looks like from what you've posted that token->text is getting garbage collected. But why is it getting collected? It must not be getting marked. Is the Token object itself getting marked? If it is, then token->text should be getting marked. Try setting a breakpoint or printing a message in the token's mark function to see.

If the token is not getting marked, then the next step is to figure out why. If it is getting marked, then the next step is to figure out why the string returned by the text() method is getting swept (maybe it's not the same object that is getting marked).

Also, are you sure that it is the token's text member that is causing the exception? Looking at:

http://github.com/dbalmain/ferret/blob/master/ruby/ext/r_analysis.c

I see that the token and the token stream both have text() methods. The TokenStream struct doesn't hold a reference to its text object (it can't, as it's a C struct with no knowledge of ruby). Thus, the Ruby object wrapping the C struct needs to hold the reference (and this is being done with rb_ivar_set).

The RToken struct shouldn't need to do this, because it marks its text member in its mark function.

One more thing: you may be able to reproduce this bug by calling GC.start explicitly in your loop rather than having to allocate so many objects that the garbage collector kicks in. This won't fix the problem but might make diagnosis simpler.

Paul Brannan
Paul: The question is what's the "right thing" to do when returning a VALUE from a C procedure back to Ruby land. Ferret had: return Data_Wrap_Struct ...in this case what's returned by Data_Wrap_Struct is 1) not on C stack, since it's returned from the C procedure, and 2) not in a Ruby object. My fix was to not "return Data_Wrap_Struct", but VALUE v = Data_Wrap_struct... rb_ivar_set (.., return v;That fixes it. I want to confirm with someone: Data_Wrap_Struct cannot be returned directly from C, but need to be referenced in a Ruby obj to prevent it from reaped. Thanks.
OverClocked
A: 

perhaps mark as volatile:

http://www.justskins.com/forums/chasing-a-garbage-collection-bug-98766.html

maybe your compile is keeping its reference in a registry instead of the stack...there is some way mentioned I think in README.EXT to force an object to never be GC'ed, but...the question still remains as to why it's being collected early...

rogerdpack