Here's one approach (my types are a bit more general and I'm not using XPath):
{-# LANGUAGE Arrows #-}
module Main where
import qualified Data.Map as M
import Text.XML.HXT.Arrow
classes :: (ArrowXml a) => a XmlTree (M.Map String String)
classes = listA (divs >>> divs >>> pairs) >>> arr M.fromList
where
divs = getChildren >>> hasName "div"
pairs = proc div -> do
cls <- getAttrValue "class" -< div
val <- deep getText -< div
returnA -< (cls, val)
getValues :: (ArrowXml a) => [String] -> a XmlTree [(String, Maybe String)]
getValues cs = classes >>> arr (zip cs . lookupValues cs)
where lookupValues cs m = map (flip M.lookup m) cs
main = do
let xml = "<div><div class='c1'>a</div><div class='c2'>b</div>\
\<div class='c3'>123</div><div class='c4'>234</div></div>"
print =<< runX (readString [] xml >>> getValues ["c1", "c2", "c3", "c4"])
I would probably run an arrow to get the map and then do the lookups, but this way works as well.
To answer your question about listA
: divs >>> divs >>> pairs
is a list arrow with type a XmlTree (String, String)
—i.e., it's a non-deterministic computation that takes an XML tree and returns string pairs.
arr M.fromList
has type a [(String, String)] (M.Map String String)
. This means we can't just compose it with divs >>> divs >>> pairs
, since the types don't match up.
listA
solves this problem: it collapses divs >>> divs >>> pairs
into a deterministic version with type a XmlTree [(String, String)]
, which is exactly what we need.