The first goal should to get working code. You are there. Then try some simple optmizations. E.g.
retVal <- matrix(NA, ni, nj) # assuming your result is scalar
for (i in 1:ni)
for (j in 1:nj)
retVal[i][j] <- *some function of yours*
will already run much faster as you do not reallocate memory for each i,j combination.
As for the looping, you can start by replacing the inner loop with something from the apply
family. I am not aware of something fully general to answer your question -- it depends what arguments your function takes and what type of return object it produces.