Given the facts you have supplied, you can substantially subset your Question to one solely focused on the hidden layer size and number. You have the information available to determine the size and number of your input and output layers.
In addition, there is a general class of techniques ('pruning' algorithms) that can be applied during network training to substantially optimize network configuration by removing redundant nodes (more on those below).
So, from your Question, the parameters you need:
Your NN must have at least one input layer and one output layer--no more, no less. What is the size (number of neurons) of each of these layers in your problem? Well, you always know the size of the input layer--it's the size of your input vector (or it's one larger if you add a bias node).
Likewise, the size of the output layer, while not strictly determined by your data, is determined completely by you--it depends solely on what output you want from the NN. For instance, if you want it to assign one of three class labels to each instance (data point) of your data, then that's the size of your output layer--three neurons.
So that leaves the hidden layers. How many hidden layers? Well if your data is linearly separable (which you often know by the time you begin coding a NN) then you don't need any hidden layers at all.
Beyond that, as you probably know, there's a mountain of commentary on the question of hidden layer configuration in NNs (see the famous NN FAQ for an excellent summary of that commentary). One issue within this subject on which there is a consensus is the performance difference from adding additional hidden layers: the situations in which performance improves with a second (or third, etc.) hidden layer are very small. One hidden layer is sufficient for the large majority of problems.
So what about size of the hidden layer(s)? There are some empirically-derived rules-of-thumb, of these, the most commonly relied on is 'the size of the hidden layer is between the input and output layers. Jeff Heaton, author of "Introduction to Neural Networks in Java" offers a few more, which are recited on the page i just linked to.
In your question, you mentioned that for whatever reason, you cannot find the optimum network architecture by trial-and-error. Another way to tune your NN configuration (without using trial-and-error) is 'pruning'. The gist of this technique is removing nodes from the network during training by identifying those nodes which, if removed from the network, would not noticeably affect network performance (i.e., resolution of the data). (Even without using a formal pruning technique, you can get a rough idea of which nodes are not important by looking at your weight matrix after training; look weights very close to zero--it's the nodes on either end of those weights that are often removed during pruning.) Obviously, if you use a pruning algorithm during training then begin with a network configuration that is more likely to have excess (i.e., 'prunable') nodes--in other words, when deciding on a network architecture, err on the side of more neurons, if you add a pruning step.
Put another way, by applying a pruning algorithm to your network during training, you can much closer to an optimized network configuration than any a priori theory is ever likely to give you.