For modulo, I find the following simplest. It doesn't matter what the implementation's sign convention is, we just coerce the result to the sign we want:
r = n % a;
if (r < 0) r += a;
Obviously that's for positive a. For negative a you need:
r = n % a;
if (r > 0) r += a;
Which (perhaps a little confusingly) combines to give the following (in C++. In C do the same thing with int, and then tediously write a duplicate for long long):
template<typename T> T sign(T t) { return t > T(0) ? T(1) : T(-1); }
template<typename T> T py_mod(T n, T a) {
T r = n % a;
if (r * sign(a) < T(0)) r += a;
return r;
}
We can use a cheapskate two-valued "sign" function because we already know a!=0, or the % would be undefined.
Applying the same principle to division (look at the output rather than the input):
q = n / a;
// assuming round-toward-zero
if ((q < 0) && (q * a != n)) --q;
The multiplications arguably could be more expensive than necessary, but can be micro-optimised later on a per-architecture basis if need be. For instance if you have a division op that gives you quotient and remainder, then you're sorted for division.
[Edit: there might be some edge cases where this goes wrong, for instance if the quotient or the remainder is INT_MAX or INT_MIN. But emulating python maths for large values is a whole other question anyway ;-)]
[Another edit: isn't the standard python implementation written in C? You could trawl the source for what they do]