Assuming you have a machine instruction udive that does a special case 64 by 32 unsigned division by taking a (32bit dividend << 32) / 32bit divisor, we can do a full 64 by 32 division using the following:
// assume: a / b guaranteed not to overflow
a = 64bit dividend, a.h & a.l are hi & lo 32bits respectively
b = 32bit divisor
q1 = udive(a.h, b) // (a.h << 32) / b
r1 = -(q1 * b) // remainder of the above, shortcut since a.h & 0xffffffff == 0
q2 = a.l / b // a.l / b using regular unsigned division
r2 = a.l - (q2 * b) // remainder of the above
q = q1 + q2
r = r1 + r2
// r < r2, r overflowed and is >32bits, implies r > b since b is 32bits
// r >= b, quotient too small by 1, adjust
if (r < r2) or (r >= b)
q = q + 1
return q
However the signed case is giving me problems. Assuming an equivalent sdive instruction that does the signed version of udive, I can't quite work out how to deal with the remainders and whatnot.