At the risk of way too much code, here is a possible (poor) solution to my own question. Using the fact that the thread-local storage memory is stored in a single block for the threadvar variables (as pointed out by Mr. Kennedy - thanks), this code stores the allocated pointers in a TList and then frees them at process detach. I wrote it mostly just to see if it would work. I probably would not use this in production code because it makes assumptions about the Delphi runtime that could change with different versions and quite possibly misses problems even with the version I am using (Delphi 7 and 2007).
This implementation does make umdh happy, it doesn't think there are any more memory leaks. However, if I run the test in a loop (load, call entrypoint on another thread, unload), the memory usage as seen in Process Explorer still grows alarmingly fast. In fact, I created a completely empty DLL with only an empty DllMain (that was not called since I did not assign Delphi's global DllMain pointer to it ... Delhi itself provides the real DllMain entrypoint). A simple loop of loading/unloading the DLL still leaked 4K per iteration. So there may still be something else a Delphi DLL is supposed to include (the main point of the original question). But I don't know what it is. A DLL written in C does not behave this way.
Our code (a server) can call DLLs written by customers to extend functionality. We typically unload the DLL after there are no more references to it. I think my solution to the problem is going to be to add an option to leave the DLL loaded "permanently" in memory. If customers use Delphi to write their DLL, they will need to turn that option on (or maybe we can detect that it is a Delphi DLL on load ... need to check that out). Nonetheless, it has been an interesting exercise.
library Sample;
uses
SysUtils,
Windows,
Classes,
HTTPApp,
SyncObjs;
{$E dll}
var
gListSync : TCriticalSection;
gTLSList : TList;
threadvar
threadint : integer;
// remove all entries from the TLS storage list
procedure RemoveAndFreeTLS();
var
i : integer;
begin
// Only call this at process detach. Those calls are serialized
// so don't get the critical section.
if assigned( gTLSList ) then
for i := 0 to gTLSList.Count - 1 do
// Is this actually safe in DllMain process detach? From reading the MSDN
// docs, it appears that the only safe statement in DllMain is "return;"
LocalFree( Cardinal( gTLSList.Items[i] ));
end;
// Remove this thread's entry
procedure RemoveThreadTLSEntry();
var
p : pointer;
begin
// Find the entry for this thread and remove it.
gListSync.enter;
try
if ( SysInit.TlsIndex <> -1 ) and ( assigned( gTLSList )) then
begin
p := TlsGetValue( SysInit.TlsIndex );
// if this thread didn't actually make a call into the DLL and use a threadvar
// then there would be no memory for it
if p <> nil then
gTLSList.Remove( p );
end;
finally
gListSync.leave;
end;
end;
// Add current thread's TLS pointer to the global storage list if it is not already
// stored in it.
procedure AddThreadTLSEntry();
var
p : pointer;
begin
gListSync.enter;
try
// Need to create the list if first call
if not assigned( gTLSList ) then
gTLSList := TList.Create;
if SysInit.TlsIndex <> -1 then
begin
p := TlsGetValue( SysInit.TlsIndex );
if p <> nil then
begin
// if it is not stored, add it
if gTLSList.IndexOf( p ) = -1 then
gTLSList.Add( p );
end;
end;
finally
gListSync.leave;
end;
end;
// Some entrypoint that uses threadvar (directly or indirectly)
function MyExportedFunc(): LongWord; stdcall;
begin
threadint := 123;
// Make sure this thread's TLS pointer is stored in our global list so
// we can free it at process detach. Do this AFTER using the threadvar.
// Delphi seems to allocate the memory on demand.
AddThreadTLSEntry;
Result := 0;
end;
procedure DllMain(reason: integer) ;
begin
case reason of
DLL_PROCESS_DETACH:
begin
// NOTE - if this is being called due to process termination, then it should
// just return and do nothing. Very dangerous (and against MSDN recommendations)
// otherwise. However, Delphi does not provide that information (the 3rd param of
// the real DlLMain entrypoint). In my test, though, I know this is only called
// as a result of the DLL being unloaded via FreeLibrary
RemoveAndFreeTLS();
gListSync.Free;
if assigned( gTLSList ) then
gTLSList.Free;
end;
DLL_THREAD_DETACH:
begin
// on a thread detach, Delphi will clean up its own TLS, so we just
// need to remove it from the list (otherwise we would get a double free
// on process detach)
RemoveThreadTLSEntry();
end;
end;
end;
exports
DllMain,
MyExportedFunc;
// Initialization
begin
IsMultiThread := TRUE;
// Make sure Delphi calls my DllMain
DllProc := @DllMain;
// sync object for managing TLS pointers. Is it safe to create a critical section?
// This init code is effectively DllMain's DLL_PROCESS_ATTACH
gListSync := TCriticalSection.Create;
end.