Once you have a token stream, you can transform it into a parse tree by using a parser generator and specifying the grammar of your language.
Depending on the programming language you'd like to use, you might want to look into the following parser generators:
C/C++ - Yacc
Java - ANTLR (also JavaCC, SableCC)
Python - PLY (Python Lex / Yacc)
OCaml - ocamlyacc
If you don't know about grammars, the documentation for the parser generator you choose should give you enough to get you going.
When your parser is done, it will take the token stream and construct a tree using an intermediate representation - types that you define to represent the various pieces of your language (like the text and special identifiers that you mentioned). You can then manipulate the tree as you like.
Edit: in response to your comment - I'm not quite sure what level of answer to give you, as I can't tell exactly what problem you are having. First, are you familiar with tree data structures? If so, would you know how to write a simple recursive algorithm to find the height of a tree, or run a depth-first search? Remember that a tree is just a way to organize information - it is entirely up to you what you do with that information.
A common design pattern for applying an algorithm to a heterogenous tree (ie a tree whose nodes are of different types) is the Visitor pattern. If you are already familiar with trees, you can look up an example of the Visitor pattern in your favorite language; however, if the concept is new to you I would strongly recommend starting off with the simpler algorithms for practice.