An answer is that functional programming is to program using functions, as they are defined in mathematics (in short, side-effect free things that map values from the domain to the codomain). To actually translate that into "how to think" is the hand-waving part that is difficult to be exhaustive about, but I'll sample some of my thoughts:
- The definition is more important than the efficiency. That is, an obviously correct implementation of a function that one can understand all of the behaviour of is better than a complex optimized one that is hard to reason about. (And should be preferred as long as possible; until there is evidence one must break this nice property.)
- A mathematical function has no side-effects. A useful program must have side-effects. A functional programmer is aware of side effects, as a very dangerous and complicating thing, and designs the program as a bunch of functions that take output values from one side-effect and creates input values to the next side-effect.
Number one is associated with the vague: "elegant code". List comprehensions can present very succinct and mathematical-equations like definitions of functions. Just look at the quick-sort implemented with LCs. This is how I define elegance, succinct and makes all behaviours clear. Not that perl code-golf where you are most often terse and cryptic.
Number two is something that I use day to day in all programming. Divide code into functions (methods, routines, etc...) of current state that are side-effect free computations giving inputs to the next action to take (even which the next action to take is). When the value is returned, give it to a routine that performs the action that is described, then start over.
In my head I diagram an Erlang process as a state machine graph, where each vertex is a side-effect and a function whose output is which edge to chose out of the vertex. The high regard of side-effects is something the functional programming paradigm taught me. Especially in Erlang, since side-effects really matter in concurrency, and Erlang makes concurrency very available.
The same way some isolated tribes have only one word for numbers above 3, or no words for "mine"/"yours". It feels like popular languages do not have words for "this will cause a side-effect", but Functional Programming has it. It is forcing you to be aware of it all the time, and that is a good thing.