To answer your first question, Numpy.correlate(a, v, mode) is performing the convolution of a with the reverse of v and giving the results clipped by the specified mode. Because of the definition of convolution, the correlation C(t) = Sum for -inf < i < inf of (a[i] * v[t + i]), where -inf < t < inf. Even though this definition of the correlation would allow for results from -infinity to infinity, you obviously can't store an infinitely long array. So it has to be clipped, and that is where the mode comes in. There are 3 different modes: full, same, & valid. 'full' mode returns results for every t where both a and v have some overlap. 'same' mode returns a result with the same length as the shortest vector (a or v). 'valid' mode returns results only when a and v completely overlap each other. The documentation for Numpy.convolve gives more detail on the modes.
For your second question, I think Numpy.correlate is giving you the autocorrelation, it is just giving you a little more as well. The autocorrelation is used to find how similar a signal, or function, is to itself at a certain time difference. At a time difference of 0, the auto-correlation should be the highest because the signal is identical to itself, so you expected that the first element in the auto-correlation result array would be the greatest. However, the correlation is not starting at a time difference of 0. It starts at a negative time difference, closes to 0, and then goes positive. That is, you were expecting:
Autocorrelation(a) = Sum for -inf < i < inf (a[i] * v[t + i]), where 0 <= t < inf
But what you got was:
Autocorrelation(a) = Sum for -inf < i < inf (a[i] * v[t + i]), where -inf < t < inf
What you need to do is take the last half of your correlation result, and that should be the auto-correlation you are looking for. A simple python function to do that would be:
def autocorr(x):
result = numpy.correlate(x, x, mode='full')
return result[result.size/2:]
You will, of course, need error checking to make sure that x is actually a 1-d array. Also, this explanation probably isn't the most mathematically rigorous. I've been throwing around infinities because the definition of convolution uses them, but that doesn't necessarily apply for auto-correlation. So, the theoretical portion of this explanation may be slightly wonky, but hopefully the practical results are helpful. These pages on auto-correlation are pretty helpful, and can give you a much better theoretical background if you don't mind wading through the notation and heavy concepts.