+1  A: 

Haven't ever modified GS in x64 code, so I may be wrong, but shouldn't you be able to modify GS by PUSH/POP or by LGS?

Update: Intel manuals say also mov SegReg, Reg is permissible in 64-bit mode.

PhiS
+1  A: 

You can modify the thread context via the SetThreadcontext API directly. However, you need to make sure that the thread is not running while the context is changed. Either suspend it and modify the context from another thread, or trigger a fake SEH exception and modify the thread context in the SEH handler. The OS will then change the thread context for you and re-schedule the thread.

Update:

Sample code for the second approach:

__try
{
    __asm int 3 // trigger fake exception
}
__except(filter(GetExceptionCode(), GetExceptionInformation()))
{
}

int filter(unsigned int code, struct _EXCEPTION_POINTERS *ep)
{
    ep->ContextRecord->SegGs = 23;
    ep->ContextRecord->Eip++;
    return EXCEPTION_CONTINUE_EXECUTION;
}

The instruction in the try block basically raises a software exception. The OS then transfers control to the filter procedure which modifies the thread context, effectively telling the OS to skip the int3 instruction and to continue execution.
It's kind of a hack, but its all documented functionality :)

jn_
I think it's simpler - "mov ax, 23h ; mov gs, ax ; " -- there are two problems - (1) how to get set a linear base on descriptor 20h in 64-bit mode? and (2) using a 64-bit linear address (going through the segment register only sets the low 32-bits as far as I can tell)
caffiend
I see your problem now. Well, if you cannot modify the LDT (because this concept isn't used anymore), your only chance left is to modify the GDT (if that even exists in X64 - since x64 is tied mostly to the flat address space model), and that one is only accessible form kernel mode. I think what you want to do is not possible from user mode (I might be wrong, though)
jn_
Just read on wikipedia that windows will bugcheck if you attempt to modify the GDT on X64: http://en.wikipedia.org/wiki/Global_Descriptor_Table
jn_
x64 has the alternative mechanism ("gs.base MSR"), but WRMSR is a Ring0 instruction. It would have been nice if the CONTEXT structure had GSBASE but that doesn't seem to be the case in the headers I have.
caffiend
+1  A: 

Why do you need to set the GS register? Windows sets if for you, to point to TLS space.

While I haven't code for X64, I have build a compiler that generates 32 bit code that manages threads, using FS. Under X64, GS replaces FS and everything else pretty works the same. So, GS points to the thread local store. If you allocated a block of thread local variables (on Win32, we allocate 32 of 64 at offset 0), your thread now has direct access to 32 storage locations to whatever it wishes to do with. You don't need to allocate working thread-specific space; Windows has done it for you.

Of course, you might want to copy what you consider your specific thread data into this space you've set aside, in whatever scheduler you've set up to run your language specific threads.

Ira Baxter
I need GS to point to the app-thread specific data - There are multiple app-threads per O/S thread, so I can't rely on the OS.
caffiend
With multiple app-threads per OS thread, you must be scheduling your own app threads. In my experience, there is small amount of data that needs to be accessible fast via GS or whatever. Have the scheduler copy that data to the TLS area. Also, have the scheduler copy a pointer to "the rest" of your app-thread data to one last TLS cell. Now everything is addressable via GS: the critical stuff is in TLS, the less critical accessible via GS and an extra load. You may not like this choice, but if you can't change GS either you burn GP register permanently or you do this.
Ira Baxter
+1  A: 

Since x86_64 has many more registers than x86, one option that you may want to consider if you can't use GS would simply be to use one of the general purpose registers (eg, EBP) as a base pointer, and make up for the difference with the new R8-R15 registers.

bdonlan
While this is a plausible workaround, x86-64 didn't add a new term to the addressing - the existing code makes heavy use of scaled-index + base addressing, with the GS term as a 3rd term. I think I can use use 'lea' followed by a two-register form, but I also have to find cases like "mov eax, mem", which accept a prefix but need completely replaced to use register-based addressing.
caffiend
+1  A: 

What happens if you just move to OS threads? Is performance that bad?

You could use a single pointer-sized TLS slot to store the base of your lightweight thread's storage area. You'd just have to swap out one pointer during your context switch. Load one of the new temp registers from there whenever you need the value, and you don't have to worry about using one of the few preserved across function calls.

Another supported solution would be to use the Fiber APIs to schedule your lightweight threads. You would then change the JIT to make proper calls to FlsGet/SetValue.

Sorry, it sounds like the old code is written to rely on segment prefixes for addressing and now the LDT is just not available for that sort of thing. You're going to have to fix the code generation a little bit.

the existing code makes heavy use of scaled-index + base addressing, with the GS term as a 3rd term. I think I can use use 'lea' followed by a two-register form

Sounds like a good plan.

cases like "mov eax, mem", which accept a prefix but need completely replaced to use register-based addressing

Perhaps you could move those to address + offset addressing. The offset register could be the register holding the base of your TLS block.

Marsh Ray
+1  A: 

Why not use GetFiberData or are you trying to avoid the two extra instructions?

atomice