views:

188

answers:

6

Mmap returns a void*, but not a volatile void*. If I'm using mmap to map shared memory, then another process could be writing to that memory, which means two subsequent reads from the same memory location can yield different values -- the exact situation volatile is meant for. So why doesn't it return a volatile void*?

My best guess is that if you have a process that's exclusively writing to the shared memory segment, it doesn't need to look at the shared memory through volatile pointers because it will always have the right understanding of what's present; any optimizations the compiler does to prevent redundant reads won't matter since there is nothing else writing and changing the values under its feet. Or is there some other historical reason? I'm inclined to say returning volatile void* would be a safer default, and those wanting this optimization could then manually cast to void*.

POSIX mmap description: http://opengroup.org/onlinepubs/007908775/xsh/mmap.html

+3  A: 

I don't think volatile does what you think it does.

Basically, it just tells the compiler not to optimize the variable by storing its value in a register. This forces it to retrieve the value each time you reference it, which is a good idea if another thread (or whatever) could have updated it in the interim.

The function returns a void*, but it's not going to be updated, so calling it volatile is meaningless. Even if you assigned the value to a local volatile void*, nothing would be gained.

Steven Sudit
Feel free to explain the downvote.
Steven Sudit
Downvoted, you need to expound for your answer to be useful. I think you're assuming I think volatile acts as a memory barrier or has something to do with threading, I'm fully aware that it doesn't.
Joseph Garvin
@Joseph, `volatile` would protect you against multithreaded changes to the value of the `void*` itself. It doesn't protect you against changes to the data pointed to.
JSBangs
Ah, you've edited your answer. "but it's not going to be updated" <-- what's not going to be updated? The pointed to memory most definitely is going to be updated, and "volatile void*" means "pointer to memory that may change".
Joseph Garvin
@JS Bangs: No. You are misreading the type. That would be the case for a "void* volatile" not a "volatile void *".
Joseph Garvin
@Joseph: As JS pointed out, what's not going to be updated is the value returned to you. It's just a copy of the value stored in the data structure.
Steven Sudit
@Steven/JS: I've put code tags around the "volatile void*" to try to indicate that I literally mean that type, that is, a pointer to something that is volatile, *not* a volatile pointer to something.
Joseph Garvin
@Joseph: Uhm, if you're going to dereference the void*, you'll need to downcast to a specific type, in which case you would apply the volatility there. Even so, for the reasons given by JS and others, I don't believe this would work the way you imagine.
Steven Sudit
@Steven: You will get a warning if you try to cast away volatility with most compilers, but you won't get a warning casting from a volatile void* to volatile some_specific_type*. If you strictly keep to C++ style casts then you can't do it at all without explicitly using const_cast.
Joseph Garvin
Joseph, why on Earth should he expand his answer instead of you expanding your question? Why on Earth should we guess what you are aware of and what not? Your understanding of `volatile` is wrong. Man invests his time into explaining it to you, and instead of trying to work through his explanation or asking for clarification you just silently downvote him.
Roman Cheplyaka
@Steven He's not trying to make the returned pointer volatile, but he wants the memory region pointed to to be volatile, so that if another thread/process writes to it, the new values will be reread.
Mark B
@Roman: Look at the edit history of his post. The original answer had no explanation at all, it was just "I don't think volatile does what you think it does."
Joseph Garvin
@Mark: And that's just not going to work. There's no reason at all to believe that changes to the data structure will cause the returned pointer value to reference a new target. It could just as easily allocate the new pointer to the new value in a new entry in its hash table or tree.
Steven Sudit
@Roman: I initially posted my first sentence alone before explaining further. This probably explains why he initially downvoted it. However, it doesn't explain the three downvotes currently logged against this answer.
Steven Sudit
@Roman Cheplyaka: his understanding of `volatile void*` is correct. `volatile void*` is the same as `void volatile*` and not `void* volatile`. And `void* volatile` as a return value of a function would indeed not make much sense. `void volatile*` does make perfect sense, and why this is not chosen for `mmap` can be seen from nos answer.
Jens Gustedt
@Jens Gustedt: `volatile void*` makes no sense. It means that whatever the pointer points to is volatile, but what does a void* point to?
JeremyP
@Jens Gustedt: Generic data pointers which cannot be dereferenced. Since, a `void *` points to `unspecific untyped data` (your words), why does a type qualifier to `unspecific untyped data` make sense?
terminus
@terminus: A `volatile void*` would have to be downcast to, say, a `volatile int*` before dereferencing. A normal cast can add `volatile` but not remove it, so having `volatile` in the function's return *would* have some effect. See elsewhere for a healthy debate on whether that effect is the desired one.
Steven Sudit
@Steven Sudit Can you point to some C spec bits which state that a volatile cast cannot be removed? This trivial C program when compiled with `gcc -Wall -Wextra volatile.c` does not complain: `volatile char *x; int main() { unsigned char *y = (unsigned char *) x; printf("%c\n", *y); return 0; }`
terminus
@terminus: In C++, you can only remove a `const` or `volatile` using `const_cast`, unless you use a dangerous C-style cast, as in your examples. I do realize that the question mentions C, but all of my C programming these days is technically C++, allowing me to take advantage of language improvements even if I'm not using the OOPL features or libraries.
Steven Sudit
+3  A: 

The deeply-held assumption running through many software systems is that most programmers are sequential programmers. This has only recently started to change.

mmap has dozens of uses not related to shared memory. In the event that a programmer is writing a multithreaded program, they must take their own steps to ensure safety. Protecting each variable with a mutex is not the default. Likewise, mmap does not assume that another thread will make contentious accesses to the same shared-memory segment, or even that a segment so mapped will be accessible by another thread.

I'm also unconvinced that marking the return of mmap as volatile will have an effect on this. A programmer would still have to ensure safety in access to the mapped region, no?

Borealid
A memory access would only "not be contentious" in this sense if two process never accessed the same byte in the segment, which would defeat the point of using shared memory. You still have to ensure the safety of access to the shared region, but that's a different issue.
Joseph Garvin
@Joseph Garvin: actually, there are valid write-once-read-many situations where accesses do not conflict and shared memory is still a useful paradigm. For instance, the memory region could be created and written to, and *then* other processes created to read it. Since they exist after the write in program order, they cannot see the old value.
Borealid
@Borealid: True, if the writer process finish its writing before the readers are ever created, then the readers will only see the new value. I hadn't considered that case. I still think it's the wrong default though ;)
Joseph Garvin
+3  A: 

Being volatile would only cover a single read (which depending on the architecture might be 32 bit or something else, and thus be quite limiting. Often you'll need to write more than 1 machine word, and you'll anyway have to introduce some sort of locking.

Even if it were volatile, you could easily have 2 processes reading different values from the same memory, all it takes is a 3. process to write to the memory in the nanosecond between the read from the 1. process and the read from the 2. process(unless you can guarantee the 2 processes reading the same memory within almost exact the same clock cycles.

Thus - it's pretty useless for mmap() to try to deal with these things, and is better left up to the programmer how to deal with access to the memory and mark the pointer as volatile where needed - if the memory is shared - you will need to have all partys involved be cooperative and aware of how they can update the memory in relation to eachother - something out of scope of mmap, and something volative will not solve.

nos
`volatile void*` and `void volatile*` are exactly the same. The different version is `void* volatile`.
Mark B
"Being volatile would only cover a single read" <-- I'm not sure what you mean. Being volatile means that every use of the variable will force a new read. How would that only cover a single read?
Joseph Garvin
It just means that adding volatile, you could read a 32 bit value and always get a consistant result (on most 32 bit machines anyhow) even if someone else updates that 32 bit value, whilst reading a 64 bit value you could get an inconsistant result - volatile won't "protect" you, or guarantee that you see the most recent value in that case. And that's just the nice case, other processes could be manupilating arbitary chunks of bytes.
nos
@nos: That's not my understanding of volatile. If I make a 100 byte struct, and declare it volatile, reads from any part of it will not be cached. I don't think this is architecture specific. Volatile shares the same propagating semantics as const.
Joseph Garvin
@nos: Right, the read has to be no larger than the bus width, which usually means 64 bits these days. Add some complications for alignment and the answer is otherwise correct.
Steven Sudit
@Joseph: I've seen volatile used on things like integers, not pointers. I have no reason to believe the compiler will honor it the way you imagine.
Steven Sudit
@Steven: The C and C++ standards both allow for volatile to be used on arbitrary types. volatile is implicitly applied to members of structs and classes just like const is.
Joseph Garvin
@Joseph Garvin You're right that it will not be cached (although on a multicore x86 machine it could be cached in l1/l2 cache unless the writing is prefixed with the LOCK opcode - which adding volatile will not do), but nothing stops someone else from changing the 2. byte in one of your integers in that struct while you read from it - the point is, having mmap return a volatile pointer solves almost nothing, so it's up to the programmer to make that pointer volatile if it's needed, and its up to the programmer to deal with (the much larger) issue of locking/cooperative access.
nos
@Joseph: I think nos answered for me.
Steven Sudit
@nos The LOCK prefix does not have anything to do with caching. It only serves to instruct the processor that the next opcode requires exclusive access to the memory location.
terminus
@terminus it will server as a memory barrier.
nos
@nos yes it would serve as a barrier. Why should that matter to the caching in L1/L2?
terminus
@terminus So you'll be assured that another process altering the memory makes it visible to everyone else.
nos
@Steven/nos: A memory barrier will not solve the issue unless combined with volatile, so saying that it solves nothing is mistaken. If you're only using volatile, it will just guarantee it's not cached in a register by the compiler, so that subsequent uses of the variable will really go to memory/cache. If you're using a memory barrier, changes in memory/cache will get reflected across cores but that /won't matter/ if the value has been stored in a register for later use and the compiler has generated code that just reuses that register.
Joseph Garvin
@Steven/nos: Also presumably when mmap was created single core was the norm, where the volatile issue was present but the cache coherency issue was not.
Joseph Garvin
@Joseph: `volatile` is, in practice, a way to disable an unwanted optimization. Namely, once a variable is referenced, the compiler is free to assume there are no asynchronous aliases and therefore cache it in a register. If the variable is a pointer to a byte, making it a pointer to a volatile byte just means the compiler ought not cache the results of dereferencing it. But would it have done so in the first place? That's not a rhetorical question, but a practical one. From working with optimized C++, I suspect that most compilers have no such optimization to disable.
Steven Sudit
@Steven: Whaaaat? Most compilers definitely will cache loaded values in registers. Register allocation is one of the most important and researched aspects of a compiler.
Joseph Garvin
@Joseph: I don't believe you understood what I said about caching. First of all, it has nothing to do with whether the value is kept in a register or in memory. The issue is whether calling `a = p[x];`, where `x` is a variable, will be cached in a way that avoids the dereferencing `p` the next time the expression is evaluated.
Steven Sudit
A: 

It's probably done that way for performance reasons, providing nothing extra by default. If you know that on your particular architecture that writes/reads won't be reordered by the processor you may not need volatile at all (possibly in conjuction with other synchronization). EDIT: this was just an example - there may be a variety of other cases where you know that you don't need to force a reread every time the memory is accessed.

If you need to ensure that all the addresses are read from memory each time they're accessed, const_cast (or C-style cast) volatile onto the return value yourself.

Mark B
Are you sure that'll have any effect?
Steven Sudit
A: 

The type volatile void * or void * volatile is nonsensical: you cannot dereference a void *, so it doesn't make sense to specify type qualifiers to it.

And, since you anyway need a cast to char * or whatever your data type, then perhaps that is the right place to specify volatility. Thus, the API as defined nicely side-steps the responsibility of marking the memory changable-under-your-feet/volatile.

That said, from a big picture POV, I agree with you: mmap should have a return type stating that the compiler should not cache this range.

terminus
Could the down-voters comment on why this is being downvoted?
terminus
I didn't downvote you, but I suspect it's because your answer overlooks the issues that caf covered and that came up in a comment you made to me.
Steven Sudit
+2  A: 

Implementing shared memory is only one small subset of the uses of mmap(). In fact the most common uses are creating private mappings, both anonymous and file-backed. This means that, even if we accepted your contention about requiring a volatile-qualified pointer for shared memory access, such a qualifier would be superfluous in the general case.

Remember that you can always add final qualifiers to a pointer type without casting, but you can't remove them. So, with the current mmap() declaration, you can do both this:

volatile char *foo = mmap();  /* I need volatile */

and this:

char *bar = mmap();  /* But _I_ do not */

With your suggestion, the users in the common case would have to cast the volatile away.

caf