Haven't ever modified GS in x64 code, so I may be wrong, but shouldn't you be able to modify GS by PUSH/POP or by LGS?
Update: Intel manuals say also mov SegReg, Reg is permissible in 64-bit mode.
Haven't ever modified GS in x64 code, so I may be wrong, but shouldn't you be able to modify GS by PUSH/POP or by LGS?
Update: Intel manuals say also mov SegReg, Reg is permissible in 64-bit mode.
You can modify the thread context via the SetThreadcontext API directly. However, you need to make sure that the thread is not running while the context is changed. Either suspend it and modify the context from another thread, or trigger a fake SEH exception and modify the thread context in the SEH handler. The OS will then change the thread context for you and re-schedule the thread.
Update:
Sample code for the second approach:
__try
{
__asm int 3 // trigger fake exception
}
__except(filter(GetExceptionCode(), GetExceptionInformation()))
{
}
int filter(unsigned int code, struct _EXCEPTION_POINTERS *ep)
{
ep->ContextRecord->SegGs = 23;
ep->ContextRecord->Eip++;
return EXCEPTION_CONTINUE_EXECUTION;
}
The instruction in the try block basically raises a software exception. The OS then transfers control to the filter procedure which modifies the thread context, effectively telling the OS to skip the int3 instruction and to continue execution.
It's kind of a hack, but its all documented functionality :)
Why do you need to set the GS register? Windows sets if for you, to point to TLS space.
While I haven't code for X64, I have build a compiler that generates 32 bit code that manages threads, using FS. Under X64, GS replaces FS and everything else pretty works the same. So, GS points to the thread local store. If you allocated a block of thread local variables (on Win32, we allocate 32 of 64 at offset 0), your thread now has direct access to 32 storage locations to whatever it wishes to do with. You don't need to allocate working thread-specific space; Windows has done it for you.
Of course, you might want to copy what you consider your specific thread data into this space you've set aside, in whatever scheduler you've set up to run your language specific threads.
Since x86_64 has many more registers than x86, one option that you may want to consider if you can't use GS would simply be to use one of the general purpose registers (eg, EBP) as a base pointer, and make up for the difference with the new R8-R15 registers.
What happens if you just move to OS threads? Is performance that bad?
You could use a single pointer-sized TLS slot to store the base of your lightweight thread's storage area. You'd just have to swap out one pointer during your context switch. Load one of the new temp registers from there whenever you need the value, and you don't have to worry about using one of the few preserved across function calls.
Another supported solution would be to use the Fiber APIs to schedule your lightweight threads. You would then change the JIT to make proper calls to FlsGet/SetValue
.
Sorry, it sounds like the old code is written to rely on segment prefixes for addressing and now the LDT is just not available for that sort of thing. You're going to have to fix the code generation a little bit.
the existing code makes heavy use of scaled-index + base addressing, with the GS term as a 3rd term. I think I can use use 'lea' followed by a two-register form
Sounds like a good plan.
cases like "mov eax, mem", which accept a prefix but need completely replaced to use register-based addressing
Perhaps you could move those to address + offset addressing. The offset register could be the register holding the base of your TLS block.
Why not use GetFiberData or are you trying to avoid the two extra instructions?