I've been studying hierachial reinforcement learning problems, and while a lot of papers propose interesting ways for learning a policy, they all seem to assume they know in advance a graph structure describing the actions in the domain. For example, The MAXQ Method for Hierarchial Reinforcement Learning by Dietterich describes a complex graph of actions and sub-tasks for a simple Taxi domain, but not how this graph was discovered. How would you learn the hierarchy of this graph, and not just the policy?
Say there is this agent out there moving about doing things. You don't know its internal goals (task graph). How do you infer its goals?
In way way, this is impossible. Just as it is impossible for me to know what goal you had mind when you put that box down: maybe you were tired, maybe you saw a killer bee, maybe you had to pee....
You are trying to model an agent's internal goal structure. In order to do that you need some sort of guidance as to what are the set of possible goals and how these are represented by actions. In the research literature this problem has been studied under the terms "plan recognition" and also with the use of POMDP (partially observable markov decision process), but both of these techniques assume you do know something about the other agent's goals.
If you don't know anything about its goals, all you can do is either infer one of the above models (This is what we humans do. I assume others have the same goals I do. I never think, "Oh, he dropped his laptop, he must be ready to lay an egg" cse, he's a human.) or model it as a black box: a simple state-to-actions function then add internal states as needed (hmmmm, someone must have written a paper on this, but I don't know who).
It is difficult to understand what exactly you mean, but a causal-graph structure can be learnt using Bayesian Network techniques, for example.
In Dietterich's MAXQ, the graph is constructed manually. It's considered to be a task for the system designer, in the same way that coming up with a representation space and reward functions are.
Depending on what you're trying to achieve, you might want to automatically decompose the state space, learn relevant features, or transfer experience from simple tasks to more complex ones.
I'd suggest you just start reading papers that refer to the MAXQ one you linked to. Without knowing what exactly what you want to achieve, I can't be very prescriptive (and I'm not really on top of all the current RL research), but you might find relevant ideas in the work of Luo, Bell & McCollum or the papers by Madden & Howley.
This paper describes one approach that is a good starting point:
N. Mehta, S. Ray, P. Tadepalli, and T. Dietterich. Automatic Discovery and Transfer of MAXQ Hierarchies. In International Conference on Machine Learning, 2008.