views:

853

answers:

3

I have been chasing down what appears to be a memory leak in a DLL built in Delphi 2007 for Win32. The memory for the threadvar variables is not freed if the threads still exist when the DLL is unloaded (there are no active calls into the DLL when it is unloaded).

The question: Is there some way to cause Delphi to free memory associated with threadvar variables? It is not as simple as just not using them. A number of the existing Delphi components use them, so even if the DLL does not explicitly declare them, it ends up using them.

A Few Details I have tracked it down to a LocalAlloc call that occurs in response to the usage of a threadvar variable, which is Delphi's "wrapper" around thread-local storage in Win32. For the curious, the allocation call is in the Delphi source file sysinit.pas. The corresponding LocalFree call occurs only for threads that get DLL_THREAD_DETACH calls. If you have multiple threads in an application and unload a DLL, there is no DLL_THREAD_DETACH call for each thread. The DLL gets a DLL_PROCESS_DETACH and nothing else; I believe that is expected and valid. Thus, any thread-local storage allocations made on other threads are leaked.

I re-created it with a short C program that starts several "worker" threads. It loads the DLL (via LoadLibrary) on the main thread and then makes calls into an exported function on the worker threads. The function exported from the Delphi DLL assigns a value to a threadvar integer variable and returns. The C program then unloads the DLL (via FreeLibrary on the main thread) and repeats. After about 32,000 iterations, the process memory usage shown in Process Explorer grows to over 130MB. I also verified it more accurately with umdh. UMDH showed 24 bytes lost per instance. But the 130MB in Process Explorer seems to indicate about 4K per iteration; I'm guessing a 4K segment was leaked each time based on that, but I don't know for sure.

For clarification, here is the threadvar declaration and the entire exported function:

threadvar
   threadint : integer;

function Startup( ulID: LongWord; hValue: Longint ): LongWord; stdcall;
begin
   threadint := 123;
   Result := 0;
end;

Thanks.

+3  A: 

As you've already determined, thread-local storage will get released for each thread that gets detached from the DLL. That happens in System._StartLib when Reason is DLL_Thread_Detach. For that to happen, though, the thread needs to terminate. Thread-detach notifications occur when the thread terminates, not when the DLL is unloaded. (If it were the other way around, the OS would have to interrupt the thread someplace so it could insert a call to DllMain on the thread's behalf. That would be disastrous.)

The DLL is supposed to receive thread-detach notifications. In fact, that's the model suggested by Microsoft in its description of how to use thread-local storage with DLLs.

The only way to release thread-local storage is to call TlsFree from the context of the thread whose storage you want to free. From what I can tell, Delphi keeps all its threadvars in a single TLS index, given by the TlsIndex variable in SysInit.pas. You can use that value to call TlsFree whenever you want, but you'd better be sure there won't be any more code executed by the DLL in the current thread.

Since you also want to free the memory used for holding all the threadvars, you'll need to call TlsGetValue to get the address of the buffer Delphi allocates. Call LocalFree on that pointer.

This would be the (untested) Delphi code to free the thread-local storage.

var
  TlsBuffer: Pointer;
begin
  TlsBuffer := TlsGetValue(SysInit.TlsIndex);
  LocalFree(HLocal(TlsBuffer));
  TlsFree(SysInit.TlsIndex);
end;

If you need to do this from the host application instead of from within the DLL, then you'll need to export a function that returns the DLL's TlsIndex value. That way, the host program can free the storage itself after the DLL is gone (thus guaranteeing no further DLL code executes in a given thread).

Rob Kennedy
Ah yes - I had not realized it was just one TLS slot per thread. Thanks for pointing that out. I believe, though, this solution would require making that call on each thread. And as you correctly stated, it is not possible/desirable to interrupt the other threads from whatever they are doing to make a call to TlsGetValue to get the pointer and free it. Incidentally, I believe the TlsFree call does occur on the `DLL_PROCESS_DETACH` call. But knowing that it is a single TLS slot per thread is useful. I will ponder that. Mark
Mark Wilkins
+2  A: 
François
The test case is using a 4 byte integer (which does not need to be freed); it is not using any kind of dynamic variable. The memory that is being leaked is the memory that Delphi allocates under the covers for storing threadvar variables.
Mark Wilkins
A: 

At the risk of way too much code, here is a possible (poor) solution to my own question. Using the fact that the thread-local storage memory is stored in a single block for the threadvar variables (as pointed out by Mr. Kennedy - thanks), this code stores the allocated pointers in a TList and then frees them at process detach. I wrote it mostly just to see if it would work. I probably would not use this in production code because it makes assumptions about the Delphi runtime that could change with different versions and quite possibly misses problems even with the version I am using (Delphi 7 and 2007).

This implementation does make umdh happy, it doesn't think there are any more memory leaks. However, if I run the test in a loop (load, call entrypoint on another thread, unload), the memory usage as seen in Process Explorer still grows alarmingly fast. In fact, I created a completely empty DLL with only an empty DllMain (that was not called since I did not assign Delphi's global DllMain pointer to it ... Delhi itself provides the real DllMain entrypoint). A simple loop of loading/unloading the DLL still leaked 4K per iteration. So there may still be something else a Delphi DLL is supposed to include (the main point of the original question). But I don't know what it is. A DLL written in C does not behave this way.

Our code (a server) can call DLLs written by customers to extend functionality. We typically unload the DLL after there are no more references to it. I think my solution to the problem is going to be to add an option to leave the DLL loaded "permanently" in memory. If customers use Delphi to write their DLL, they will need to turn that option on (or maybe we can detect that it is a Delphi DLL on load ... need to check that out). Nonetheless, it has been an interesting exercise.

library Sample;

uses
  SysUtils,
  Windows,
  Classes,
  HTTPApp,
  SyncObjs;

{$E dll}

var
   gListSync : TCriticalSection;
   gTLSList  : TList;


threadvar
   threadint : integer;


// remove all entries from the TLS storage list
procedure RemoveAndFreeTLS();
var
   i : integer;
begin
   // Only call this at process detach. Those calls are serialized
   // so don't get the critical section.
   if assigned( gTLSList ) then
      for i := 0 to gTLSList.Count - 1 do
         // Is this actually safe in DllMain process detach?  From reading the MSDN
         // docs, it appears that the only safe statement in DllMain is "return;"
         LocalFree( Cardinal( gTLSList.Items[i] ));

end;


// Remove this thread's entry
procedure RemoveThreadTLSEntry();
var
   p : pointer;
begin
   // Find the entry for this thread and remove it.
   gListSync.enter;
   try
      if ( SysInit.TlsIndex <> -1 ) and ( assigned( gTLSList )) then
         begin
            p := TlsGetValue( SysInit.TlsIndex );

            // if this thread didn't actually make a call into the DLL and use a threadvar
            // then there would be no memory for it
            if p <> nil then
               gTLSList.Remove( p );
         end;

   finally
      gListSync.leave;
   end;
end;


// Add current thread's TLS pointer to the global storage list if it is not already
// stored in it.
procedure AddThreadTLSEntry();
var
   p : pointer;
begin
   gListSync.enter;
   try
      // Need to create the list if first call
      if not assigned( gTLSList ) then
         gTLSList := TList.Create;

      if SysInit.TlsIndex <> -1 then
         begin
            p := TlsGetValue( SysInit.TlsIndex );

            if p <> nil then
               begin
               // if it is not stored, add it
               if gTLSList.IndexOf( p ) = -1 then
                  gTLSList.Add( p );
               end;
         end;

   finally
      gListSync.leave;
   end;
end;



// Some entrypoint that uses threadvar (directly or indirectly)
function MyExportedFunc(): LongWord; stdcall;
begin
   threadint := 123;

   // Make sure this thread's TLS pointer is stored in our global list so
   // we can free it at process detach.  Do this AFTER using the threadvar.
   // Delphi seems to allocate the memory on demand.
   AddThreadTLSEntry;
   Result := 0;
end;



procedure DllMain(reason: integer) ;
begin
   case reason of
     DLL_PROCESS_DETACH:
     begin
        // NOTE - if this is being called due to process termination, then it should
        // just return and do nothing.  Very dangerous (and against MSDN recommendations)
        // otherwise.  However, Delphi does not provide that information (the 3rd param of
        // the real DlLMain entrypoint).  In my test, though, I know this is only called
        // as a result of the DLL being unloaded via FreeLibrary
        RemoveAndFreeTLS();
        gListSync.Free;
        if assigned( gTLSList ) then
           gTLSList.Free;
     end;

     DLL_THREAD_DETACH:
        begin
        // on a thread detach, Delphi will clean up its own TLS, so we just
        // need to remove it from the list (otherwise we would get a double free
        // on process detach)
        RemoveThreadTLSEntry();
        end;

   end;
end;




exports
   DllMain,
   MyExportedFunc;


// Initialization
begin
   IsMultiThread := TRUE;

   // Make sure Delphi calls my DllMain
   DllProc := @DllMain;

   // sync object for managing TLS pointers.  Is it safe to create a critical section?
   // This init code is effectively DllMain's DLL_PROCESS_ATTACH
   gListSync := TCriticalSection.Create;
end.
Mark Wilkins