views:

155

answers:

3

For an assignment, we had to implement something like a very basic sexp parser, such that for input like:

"((a b) ((c d) e) f)"

It would return:

[["a", "b"], [["c", "d"], "e"], "f"]

Since this was part of a larger assignment, the parser is only given valid input (matching parens &c). I came up with the following solution in Ruby:

def parse s, start, stop
  tokens = s.scan(/#{Regexp.escape(start)}|#{Regexp.escape(stop)}|\w+/)

  stack = [[]]

  tokens.each do |tok|
    case tok
    when start
      stack << []
    when stop
      stack[-2] << stack.pop
    else
      stack[-1] << tok
    end
  end

  return stack[-1][-1]
end

Which may not be the best solution, but it does the job.

Now, I'm interested in an idiomatic Haskell solution for the core functionality (i.e. I don't care about the lexing or choice of delimiters, taking already lexed input would be fine), if possible using only "core" haskell, without extensions or libs like parsec.

Note that this is NOT part of the assignment, I'm just interested in the Haskell way of doing things.

+3  A: 

The idiomatic way in Haskell would be to use parsec, for combinator parsing.

There are lots of examples online, including,

Don Stewart
Thanks, Don, for the fast reply, I should have added that I'm interested in a solution which does not involve libs like parsec. I'll edit the question accordingly, and have a look at the linked answers.
danlei
+4  A: 
[["a", "b"], [["c", "d"], "e"], "f"]

Does not have a valid type in haskell (because all elements of a list need to be of the same type in haskell), so you'll need to define your own datastructure for nested lists like this:

data NestedList = Value String | Nesting [NestedList]

Now if you have a list of Tokens where Token is defined as data Token = LPar | RPar | Symbol String, you can parse that into a NestedList like this:

parse = fst . parse'

parse' (LPar : tokens) =
    let (inner, rest) = parse' tokens
        (next, outer) = parse' rest
    in
      (Nesting inner : next, outer)
parse' (RPar : tokens) = ([], tokens)
parse' ((Symbol str) : tokens) =
    let (next, outer) = parse' tokens in
    (Value str : next, outer)
parse' [] = ([],[])
sepp2k
Thanks, that's exactly the kind of example I was looking for.
danlei
+2  A: 

While fancier parsers like Parsec are nice, you don't really need all that power for this simple case. The classic way to parse is using the ReadS type from the Prelude. That is also the way you would give your Sexp type a Read instance.

It's good to be at least a little familiar with this style of parsing, because there are quite a few examples of it in the standard libraries.

Here's one simple solution, in the classic style:

import Data.Char (isSpace)

data Sexp = Atom String | List [Sexp]
  deriving (Eq, Ord)

instance Show Sexp where
  show (Atom a ) = a
  show (List es) = '(' : unwords (map show es) ++ ")"

instance Read Sexp where
  readsPrec n (c:cs) | isSpace c = readsPrec n cs
  readsPrec n ('(':cs)           = [(List es, cs') |
                                      (es, cs') <- readMany n cs]
  readsPrec _ (')':_)            = error "Sexp: unmatched parens"
  readsPrec _ cs                 = let (a, cs') = span isAtomChar cs
                                   in [(Atom a, cs')]

readMany :: Int -> ReadS [Sexp]
readMany _ (')':cs) = [([], cs)]
readMany n cs       = [(e : es, cs'') | (e, cs') <- readsPrec n cs,
                                        (es, cs'') <- readMany n cs']

isAtomChar :: Char -> Bool
isAtomChar '(' = False
isAtomChar ')' = False
isAtomChar c   = not $ isSpace c

Note that the Int parameter to readsPrec, which usually indicates operator precedence, is not used here.

Yitz