Hi, I read on OSDev wiki, that protected mode of x86 architecture allow you to create separate segments for code and data, while you cannot write into code section. That Windows (yes, this is the platform) loads new code into code segment, and data are created on data segment. But, if this is the case, how does program know it must switch segments to the data segment? Becouse if I understand it right, all adress instructions point to the segment you run the code from, unless you switch the descriptor. But I also read, that so colled flat memory model allows you to run code and data within one segment. But I read this only in connection to assembler. So, please, what is the case with C compiled code on Windows? Thanks.
There are two meanings for segment in the explanation:
- an 8086 memory address segment
- an object module program section segment
The first is related to what is loaded into an 80386+ segment register; it contains a physical memory start address, memory allocation length, permitted read/write/execute access, and whether it grows from low to high or vice versa (plus some more obscure flags, like "copy on reference").
The second meaning is part of the object module language. Basically, there is a segment named code
, a segment named data
(which contains initialized data), and segment for uninitialized data named bss
(named for the pseudo instructions of 1960s assemblers meaning Block Starting with Symbol). When the linker combines object modules, it arranges all the code segments together, all the data segments together elsewhere, and the bss together as well. When the loader maps memory addresses it looks at the total code space and allocates a CPU memory allocation of at least that size, and maps the segment to the code (in a virtual memory situation) or reads the code into the allocated memory—for which it has to temporarily set the memory as data writable. The write-protection is done through the CPU'S paging mechanism, as well as the segment register. This is to protect code writing attempts through, for example, an errant data address. The loader also does the similar setup for the two data segment groups. (Besides those, there is setting up a stack segment and allocating it, and mapping shared images.)
As far as the x86 executing instructions, each operand has an associated segment register. Sometimes these are explicit, and sometimes they are implicit. Code is implicitly accessed through CS
, stack through SS
which is implied whenever the ESP
or EBP
register is involved, and DS
is implied for most other operands. ES
, FS
, and GS
must be specified as an override in all other cases, except for some of the string instructions like movs
and cmps
. In flat model, all the segment registers map to the same address space, though CS doesn't allow writing.
So, to answer your last question, the CPU has four (or more) segment registers set up at once to access the flat virtual memory space of the process. Each operand access is checked for being appropriate to the instruction (like not incrementing a CS
address) and also is checked by the paging protection unit for being allowed.
The info you read is outdated. Windows versions since ~1993 use a flat 32-bit virtual memory space. The values of the CS and DS segment registers no longer matter and cannot be changed. There is still a notion of code vs data, now implemented by memory page attributes. Review the allowed values passed in the flNewProtect argument for the VirtualProtectEx() API function.
You very rarely use this API yourself, the attributes are set by the executable image loader and the heap manager.