You can get the mean and standard deviation, but not the median.
new_n = (n(0) + n(1) + ...)
new_mean = (mean(0)*n(0) + mean(1)*n(1) + ...) / new_n
new_var = ((var(0)+mean(0)**2)*n(0) + (var(1)+mean(1)**2)*n(1) + ...) / new_n - new_mean**2
where n(0)
is the number of runs in the first data set, n(1)
is the number of runs in the second, and so on, mean
is the mean, and var
is the variance (which is just standard deviation squared). n**2
means "n squared".
Getting the combined variance relies on the fact that the variance of a data set is equal to the mean of the square of the data set minus the square of the mean of the data set. In statistical language,
Var(X) = E(X^2) - E(X)^2
The var(n)+mean(n)**2
terms above give us the E(X^2)
portion which we can then combine with other data sets, and then get the desired result.
In terms of medians:
If you are combining exactly two data sets, then you can be certain that the combined median lies somewhere between the two medians (or equal to one of them), but there is little more that you can say. Taking their average should be OK unless you want to avoid the median not being equal to some data point.
If you are combining many data sets in one go, you can either take the median of the medians, or take their average. If there may be significant systematic differences between different the data sets, then taking their average is probably better, as taking the median reduces the effect of outliers. But if you have systematic differences between runs, disregarding them is probably not a good thing to do.