Some informal answers to give you the ideas, for detailed proofs read a good book on Automata, for example this one or the ones mentioned in the other answers. And I am pretty sure there are online materials you could find answering all of your questions.
- How do you find the minimal Deterministic FSM?
The procedure is to eliminate duplicated states (or to merge equivalent states). You know that state and transitions are the keys to generate strings. Basically, duplicated states do not contribute in making the language generated any larger or less. The algorithm starts from final states that always have the ability of generating the lamda (empty string), and recursively update a table which indicate the generating ability of a state, and finally merge those states making no difference.
- Is there a way to normalize non-deterministic FSMs?
The normalized DFA for the NFA is using different collections of the states of NFA as DFA's states, for example, {state0} -(1)-> {state1, state2} to remove the non-deterministic part, there is no way to avoid the state explosion as DFA has to do that to represent the language.
- Is there a linear time bound algorithm to find the minimal FSM for a given machine?
I remember the best one is O(NLogN) by doing some tricks of reusing information in some paper by a Western Ontario University Professor, and doubt there exists better ones. I believe the classical one is O(N^2).
- Is there a way to see if two FSMs are equivalent?
Yes. Get the minimal one, code the state by their accessing string (a string that could reach the state from the start state, this is pretty much the real "name" of a state there), and verify the transition map. There might be better ways though, but there would be no big difference on the bigO.