Bayesian Networks can take advantage of the order of variable elimination because of the conditional independence assumptions built in.
Specifically, imagine having the joint distribution P(a,b,c,d) and wanting to know the marginal P(a). If you knew nothing about the conditional independence, you could calculate this by summing out over b,c and d. If these have k-ary domains, you need to do O(k^3) operations.
On the other hand, assume that you have a bayes net where A is the root, B is a child of A, C is a child of B and D is a child of C. Then, you can rewrite the joint as P(a|b)P(b|c)P(c|d)P(d) and distribute your three summations as far to the right of the equation as possible. When you actually want to compute P(a), you can precompute the value of sum_d P(d) and store this function. Likewise, you can precompute the value of P(c|d)*sum_d P(d) and store this.
In this way, you end up doing work of O(k^w*+1), where W* is the largest number of children any node in your bayes net has. In this case, we do O(k^2) work, which is also the size of the largest conditional probability table we must keep in memory. Note this is better than our original O(k^3) result, and would be even better if we had more variables.
In short, the conditional independence of a BN allows you to marginalize out variables more efficiently. Another explanation of this can be found at http://www.cs.uiuc.edu/class/sp08/cs440/notes/varElimLec.pdf.