People play a bit fast and loose with big-O notation. In the case of GCD, they generally do it in 2 ways:
1) You're right, algorithmic complexity, and hence big-O notation, should be stated in terms of the size in bits of the input, not in terms of the values of the input. That's how P, NP, and so on are defined. Assuming binary input and arbitrarily-large numbers (like a BigNum representation), and N the number of bits of input, your GCD requires at most 2^N subtractions, each of which requires time O(N) to run over each digit of the numbers being subtracted. So it's O(N*2^N). GCD can of course be done much faster if you use division rather than subtraction: O(N^2).
So, when we say that testing primality was proved in 2002 to be done in polynomial time, that's the technical definition of complexity, and we mean polynomial in the number of digits (which is the tricky part), not polynomial in the input number itself (which is trivially easy to do in "sub-linear time" using trial division).
But in practice, for algorithms which take a fixed number of integer inputs, it's more convenient to talk about complexity as though N were the input itself, not the size of the input. It's not harmful as long as you're clear what you mean in cases that might be ambiguous.
2) In practice, integer inputs are often fixed-size, 32 bit or whatever, and operations on them such as addition, multiplication and division are O(1) time. We use these facts selectively in our order analysis. Technically if your GCD program only accepts inputs up to (2^32-1), then it is O(1). It has a fixed upper bound on its running time. End of analysis.
Although technically correct that's not a very useful analysis. Almost anything you do on a real computer is O(1) on the same basis, that the size of the problem is constrained by the hardware.
It's usually more convenient to accept that addition is O(1) because the numbers are fixed-size, but ignore that GCD is also O(1), pretend that its behaviour in the range [1, 2^32) extends to infinity, and analyse it on that basis. Then with N the max of the two inputs, it comes out O(N): O(N) subtractions, each taking constant time.
Again, this is not ambiguous once you know what the terms of reference are, but beware of incorrectly comparing the first analysis I gave of Euclid's algorithm with division, O(N^2), against this analysis of the algorithm with subtraction, O(N). N is not the same in each, and subtraction is not faster ;-)