This is somewhat of a special case, especially since you didn't specify what platform(s) you're looking for, but with GCC you can use what is called mode(TI) to get (synthesized) 128-bit operations, for instance:
typedef unsigned int uint128_t __attribute__((mode(TI)));
uint64_t x = 0xABCDEF01234568;
uint64_t y = ~x;
uint128_t result = ((uint128_t) x * y);
printf("%016llX * %016llX -> ", x, y);
uint64_t r1 = (result >> 64);
uint64_t r2 = result;
printf("%016llX %016llX\n", r1, r2);
This only works on 64-bit processors, though.
One way or another, you're looking at multiple precision arithmetic to solve this. mode(TI) will cause the compiler to generate the operations for you, otherwise they have to be written explicitly.
You can use a general bigint package; ones in C++ I know of include the number theory packages LiDIA and NTL, and the bigint packages used for cryptographic code in Crypto++ and Botan). Plus of course there is GnuMP, which is the canonical C MPI library (and it does have a C++ wrapper as well, though it seemed poorly documented last time I looked at it). All of these are designed to be fast, but are also probably tuned for larger (1000+ bit) numbers, so at 128 bits you may be dealing with a lot of overhead. (On the other hand you don't say if that matters or not). And all of them (unlike the bigint-cpp package, which is GPL, are either BSD or LGPL) - not sure if it matters - but it might matter a lot.
You could also write a custom uint128_t kind of type; typically such a class would implement much the same algorithms as a regular MPI class, just hardcoded to have only 2 or 4 elements. If you are curious how to implement such algorithms, a good reference is Chapter 14 of the Handbook of Applied Cryptography
Of course doing this by hand is easier if you don't actually need all the arithmetic operations (division and modulo, in particular, are rather tricky). For instance, if you just need to keep track of a counter which might hypothetically overflow 64 bits, you could just represented it as a pair of 64 bit long longs and do the carry by hand:
unsigned long long ctrs[2] = { 0 };
void increment() {
++ctrs[0];
if(!ctrs[0]) // overflow
++ctrs[1];
}
Which of course is going to be a lot simpler to deal with than a general MPI package or a custom uint128_t class.