If you are looking at real-world examples where unification is used and useful, take a look at Unification-based grammars which are used in computational linguistics, for example HPSG and LFG. On the surface, this look like another flavour of unification, but they are really the same.
Unification-based grammar can be thought of as a CFG where the productions is extended with unification. Every term in the CGF gets an AVM, attribute value matrix, which is a directed acyclic graph of features and values. The idea here is somewhat akin to attribute grammars used in compilers.
Imagine this toy grammar:
S -> NP VP
NP -> Kim
NP -> The cats
VP -> V NP
V -> see
V -> sees
We have some slight overgeneration here in the agreement:
*The cats sees Kim
[S [NP The cats] [VP [V sees] [NP Kim]]]
In order to fix this we could refine the CFG to include the notion of agreement:
S -> NP_sg VP_sg
S -> NP_sg VP_pl
NP_sg -> Kim
NP_pl -> The cats
VP_sg -> V_sg NP_sg
VP_sg -> V_sg NP_pl
V_sg -> sees
V_pl -> see
VP_pl -> V_pl NP_pl
VP_pl -> V_pl NP_sg
Here we will reject the overgeneration from before. But this leads to combinatorial explotion very quickly. We could however augment each term with an AVM and unify these together when we parse:
S -> NP VP , C = A unified with B.
NP -> kim /[ AGR sg ]. We mark Kim as being singular
NP -> The cats / [ AGR pl ]
VP[ AGR #1 ] -> V [ AGR #1 ] NP
The #1-notation are reentrancies, which means that the value of this feature must be the same, in fact they will point to the same node in the graph after unification, iff it succedes. Here in practice we say that the agreement feature of a verb phrase is the same as the agreement of the verb in the phrase.
V -> See / [ AGR pl ]
V -> Sees / [ AGR sg ]
With our augmented toy grammar "Kim see the cats" is rejected because the NP and the VP will not unify, having a different value for its AGR feature.
When we are parsing we unifiy the AVMs together, and therefore gain very much in expressiveness, making it easy for grammar-engineers to write grammars. Typically a broad-coverage UBG has in the order of a hundred rules, while the equvivalent CFG , which may not exist, CFGs with unifaction are Turing complete, will have rules in the number of thousands or more.
For more details see
HPSG and
LFG.