Basically to achieve accurate marking you need meta-data indicating which words are used as pointers and which are not.
This meta-data could be stored per reference, as emacs does. If for your language/implementation you don't care much about memory use, you could even make references bigger than words (perhaps twice as big), so that every reference can carry type information as well as its one-word data. That way you could have a fixnum the full size of a 32 bit pointer, at the cost of references all being 64 bit.
Alternatively, the meta-data could be stored along with other type information. So for example a class could contain, as well as the usual function pointer table, one bit per word of the data layout indicating whether or not the word contains a reference that should be followed by the garbage collector. If your language has virtual calls then you must already have a means of working out from an object what function addresses to use, so the same mechanism will allow you to work out what marking data to use - typically you add an extra, secret pointer at the start of every single object, pointing to the class which constitutes its runtime type. Obviously with certain dynamic languages the type data pointed to would need to be copy-on-write, since it is modifiable.
The stack can do similar - store the accurate marking information in data sections of the code itself, and have the garbage collector examine the stored program counter, and/or link pointers on the stack, and/or other information placed on the stack by the code for the purpose, to determine which code each bit of stack relates to and hence which words are pointers. Lightweight exception mechanisms tend to do a similar thing to store information about where try/catch occurs in the code, and of course debuggers need to be able to interpret the stack too, so this can quite possibly be folded in with a bunch of other stuff you'd already be doing to implement any language, including ones with built-in garbage collection.
Note that garbage collection doesn't necessarily need accurate marking. You could treat every word as a pointer, regardless of whether it really is or not, look it up in your garbage collector's "big list of everything" to decide whether it plausibly could refer to an object that has not yet been marked, and if so treat it as a reference to that object. This is simple, but the cost of course is that it's somewhere between "quite slow" and "very slow", depending on what data structures your gc uses for the lookup. Furthermore, sometimes an integer just so happens to have the same value as the address of an unreferenced object, and causes you to keep a whole bunch of objects which should have been collected. So such a garbage collector cannot offer strong guarantees about unreferenced objects ever being collected. This might be fine for a toy implementation or first working version, but is unlikely to be popular with users.
A mixed approach might, say, do accurate marking of objects, but not of regions of the stack where things get particularly hairy. For example if you write a JIT which can create code where a referenced object address appears only in registers, not in your usual stack slots, then you might need to non-accurately follow the region of the stack where the OS stored the registers when it descheduled the thread in question to run the garbage collector. Which is probably quite fiddly, so a reasonable approach (potentially resulting in slower code) would be to require the JIT to always keep a copy of all pointer values it's using on the accurately marked stack.