A: 

Express the distribution xi as a linear combination of some independent basis distributions fj: xi = ai1f1 + ai2f2 + ... . Let us constrain fj to be independent variables uniformly distributed in 0..1 or in {0,1} (discrete). Let us now express everything we know in matrix form:

Let X be the vector (x1, x2, .., xn)
Let A be the matrix (a_ij) of dimension (k,n) (n rows, k columns)
Let F be the vector (f1, f2, .., fk) 
Let P be the vector (p1, p2, .., pn)
Let R be the matrix (E[x_i,x_j]) for i,j=1..n
Definition of the X distribution: X = A * F
Constraint on the mean of individual X variables: P = A * (1 ..k times.. 1)
Correlation constraint: AT*A = 3R or 2R in the discrete case (because E[x_i x_j] = 
  E[(a_i1*f_1 + a_i2*f_2 + ...)*(a_j1*f_1 + a_j2*f_2 + ...)] =
  E[sum over p,q: a_ip*f_p*a_jq*f_q] = (since for p/=q holds E[f_p*f_q]=0)
  E[sum over p: a_ip*a_jp*f_p^2] =
  sum over p: a_ip*a_jp*E[f_p^2] = (since E[f_p^2] = 1/3 or 1/2 for the discrete case)
  sum over p: 1/3 or 1/2*a_ip*a_jp
And the vector consisting of those sums over p: a_ip*a_jp is precisely AT*A.

Now you need to solve the two equations:

AT*A      = 3R (or 2R in the discrete case)
A*(1...1) = P

Solution of the first equation corresponds to finding the square root of the matrix 3R or 2R. See for example http://en.wikipedia.org/wiki/Cholesky_factorization and generally http://en.wikipedia.org/wiki/Square_root_of_a_matrix . Something also should be done about the second one :)

I ask mathematicians around to correct me, because I may very well have mixed AT*A with A*AT or done something even more wrong.

To generate a value of xi as a linear mixture of the basis distributions, use a two-step process: 1) use a uniform random variable to choose one of the basis distributions, weighted with corresponding probability, 2) generate a result using the chosen basis distribution.

jkff
Unfortunately, the continuous -> discrete transition is often the hardest part. For example, the problem of finding a Hadamard matrix gets a _lot_ easier if complex entries are allowed. I don't see any way to discretize your solution within the given framework.
Why should it be hard? The solution just depends on the resulting distribution being a linear mixture of basis distributions but I don't see how it depends on their continuity. Is it that discrete distributions can't easily be linearly mixed?
jkff
In this case it's the fact that your continuous distributions aren't convex combinations of Bernoulli trials.
I'm sorry, I don't quite understand. I know what is a convex combination and what are Bernoulli trials, but still: I've edited my post; does not the process described in the last paragraph give a correct result? If so, could you point me to some sources expanding your point? (anyways, probably I should just implement the stuff I described and see if it works)
jkff
The problem is that A may have entries that are not between 0 and 1.
A: 

The brute force solution is to express the constraints of the problem as a linear program with 2^N variables pr(w) where w ranges over all binary strings of length N. First the constraint that pr be a probability distribution:

for all w: 0 <= pr(w) <= 1
sum_w pr(w) = 1

Second, the constraint that the expectation of each variable be p:

for all i: sum_{w such that w[i] = 1} pr(w) = p

Third, the covariance constraints:

for all i < j: sum_{w such that w[i] = w[j] = 1} pr(w) = const * |j - i|^alpha - p^2

This is very slow, but a cursory literature search turned up nothing better. If you decide to implement it, here are some LP solvers with Python bindings: http://wiki.python.org/moin/NumericAndScientific/Libraries

I don't know linear programming, but I do not see how this will work. Any configuration of the binary series will have a non-zero probability. Is it possible to calculate the probability of any configuration?
jonalm
Yes, if the problem is solvable, then the LP routine will give you the probability of each of the 2^N configurations.
+1  A: 

A quick search at RSeek reveals that R has packages

to do this.

Dirk Eddelbuettel
That was my first idea too, but I doubt they could handle N=1000-10000 per jonalm's statement in the comments.
Aniko
I dont know R, but this seems like a reason to learn it :) Thanks.
jonalm
+2  A: 
Jason S
It's actually |i-j|^-\alpha; the solution for \alpha^|i-j| is in the literature.
hmmm... |i-j|^-alpha has no solution for i=j. Are we sure the OP did not mis-state?
Jason S
It can be Corr[xi xj] = const x|i-j|^-alpha for i != j, or Corr[xj xi] = (|i-j|+1)^alfa (whatever is easiest). Either way, Im not claiming that they are equal, but only interested in the tail behavior (|i-j| >> 1) so it should not matter.
jonalm
Thank you so much Jason. Although it was a different correlation, the solutions were really interesting.
jonalm
A: 

Here's an intuitive / experimental approach that seems to work.

If b is an binary r.v., m is the mean of the binary r.v., c is the correlation you want, rand() generates a U(0,1) r.v., and d is the correlated binary r.v. you want:

d = if(rand() < c, b, if(rand() < m , 0, 1))

That is if a uniform r.v. is less than the desired correlation, d = b. Otherwise d = another random binary number.

I ran this 1000 times for a column of 2000 binary r.v.s. with m=.5 and c = .4 and c = .5 The correlation mean was exactly as specified, the distribution appeared to be normal. For a correlation of 0.4 the std deviation of the correlation was 0.02.

Sorry - I can't prove that this works all the time, but you have to admit, it sure is easy.

Grembo
Reread the question: that's not the right correlation structure.
+2  A: 

Thanks for all your inputs. I found an answer to my question in the cute little article by Chul Gyu Park et al., so in case anyone run into the same problem, look up:

"A simple method for Generating Correlated Binary Variates" (jstor.org.stable/2684925)

for a simple algorithm. The algorithm works if all the elements in the correlation matrix are positive, and for a general marginal distribution Pr(x_i)=p_i.

j

jonalm
Jason S