views:

495

answers:

1

As part of a project I have assigned myself as a way of improving my knowledge of F# and functional programming in general, I am attempting to write a string pattern-matching algorithm from scratch without using any loops or variables (or regular expressions, or String.Replace and friends). As this is purely a learning project, I'm not interested in the best possible way to do it, just the best functional way to do it.

I'm trying to write a function that accepts a wildcard character, a pattern string, and an input string as parameters. If the pattern does not match the input, the function returns None. If the pattern does match the input, the function returns Some(str) where str is whatever part of the input string matched any wildcards that might have been present in the pattern string.

I have this mostly working, and I'll include the code in a moment. I've written a generic pattern-matching function that works on any generic list of anything that supports equality, and then a helper function that takes strings and passes lists of characters to the generic function. This all works, except for one thing: the support for multiple wildcards in the pattern string isn't very good - it takes the matches for each wildcard and concatenates them together into a single string in the output.

For example:

> strMatch '*' "foo" "bar";;
val it : string option = None

> strMatch '*' "test" "test";;
val it : string option = Some ""

> strMatch '*' "functional programming is *" "functional programming is fun";;
val it : string option = Some "fun"

> strMatch '*' "* and *" "you and me";;
val it : string option = Some "youme"

It's the last one that I'm trying to fix. Ideally I'd like to return a list of strings rather than a single string, with each element in the list being the string that matched one wildcard. Failing that, I can probably make do with a version that returns only the match for the first wildcard - it's the concatenated values from both wildcards that I need to get rid of. I'm just not quite sure how to approach it.

So if anyone can suggest how I can group my return values by which wildcard they matched, I would be grateful. I'm also interested in any other improvements to my code that you might want to suggest.

let rec doMatch (wildcard:'a) (pat:'a list) (input:'a list) : 'a list option =
    let singleMatch p i =
        match (p, i) with
        | phd :: ptl, ihd :: itl ->
            if phd = wildcard then
                match doMatch wildcard ptl itl with
                | None -> None
                | Some x -> Some(ihd :: x)
            else None
        | _ -> None

    let longerMatch p i =
        match (p, i) with
        | phd :: ptl, ihd :: itl ->
            if phd = wildcard then
                match doMatch wildcard p itl with
                | None -> None
                | Some x -> Some(ihd :: x)
            else None
        | _ -> None

    match (pat, input) with
    | [], [] -> Some([])
    | [], _::_ -> None
    | _::_, [] -> None
    | phd :: ptl, ihd :: itl ->
        if phd <> wildcard then
            if phd = ihd then doMatch wildcard ptl itl
            else None
        else
            match singleMatch pat input with
            | Some x -> Some(x)
            | None -> longerMatch pat input

let strMatch (wildcard:char) (pat:string) (input:string) =
    match doMatch wildcard (List.ofSeq pat) (List.ofSeq input) with
    | None -> None
    | Some x -> Some(new string(Array.ofList x))

You've probably guessed, but this is part of an Eliza chat-bot implementation in F#.

+4  A: 

From a design point of view, I like the idea of returning an

'a list option

where e.g.

None              // it did not match
Some[]            // matched, input had 0 wildcards
Some["foo";"bar"] // matched, input has 2 wildcards, "foo" matched 1st, "bar" 2nd

That is, just guarantee that when 'Some' is returned, the length of the list equals the number of wildcards, and the elements of the list are the matches in order. This seems to me to be straightforward to implement as well as reasonable for client code to use/consume.

(I am unclear if there is any deeper question in your long post.)

Looks like fun stuff!

EDIT

Here's some updated code. My gut tells me it's not all correct, but it at least works on your examples. The key is to use

'a list list option

since 'a is a character, an 'a list is like a string, and we want a list of strings. singleMatch starts a new list of strings, whereas longerMatch is consing onto the front of the current string.

let rec doMatch (wildcard:'a) (pat:'a list) (input:'a list) 
           : 'a list list option =
    let singleMatch p i =
        match (p, i) with
        | phd :: ptl, ihd :: itl ->
            if phd = wildcard then
                match doMatch wildcard ptl itl with
                | None -> None
                | Some xs -> Some([ihd]::xs)
            else None
        | _ -> None

    let longerMatch p i =
        match (p, i) with
        | phd :: ptl, ihd :: itl ->
            if phd = wildcard then
                match doMatch wildcard p itl with
                | None -> None
                | Some ([]) -> Some([[ihd]])
                | Some (x::xs) -> Some((ihd :: x)::xs)
            else None
        | _ -> None

    match (pat, input) with
    | [], [] -> Some([])
    | [], _::_ -> None
    | _::_, [] -> None
    | phd :: ptl, ihd :: itl ->
        if phd <> wildcard then
            if phd = ihd then doMatch wildcard ptl itl
            else None
        else
            match singleMatch pat input with
            | Some x -> Some(x)
            | None -> longerMatch pat input

let strMatch (wildcard:char) (pat:string) (input:string) =
    match doMatch wildcard (List.ofSeq pat) (List.ofSeq input) with
    | None -> None
    | Some x -> Some(x|>List.map (fun chList -> new string(Array.ofList chList)))

printfn "%A" (strMatch '*' "foo" "bar")
printfn "%A" (strMatch '*' "test" "test")
printfn "%A" (strMatch '*' "functional programming is *" 
                           "functional programming is fun")
printfn "%A" (strMatch '*' "* and *" "you and me")
Brian
I agree, that's pretty much exactly what I want to end up with. I thought that's what I was asking for, but maybe I wasn't clear. My deeper question is simply: "How do I get there from here?" I'm pretty new to F#, and I keep getting lost in all the recursion and list consing and whatnot every time I try to implement it as you suggest.
Joel Mueller
Also, since my generic function doMatch is already returning an `'a list option` wouldn't I have to instead return an `'a list list option`? My trouble is that it's not clear to me how to tell when I should be adding the current matched character to an existing list of characters versus starting a new list of characters.
Joel Mueller
After the edit: Thanks very much. `Some (x :: xs) -> Some((ihd :: x) :: xs)` was the bit of syntax I just wasn't getting right. I learned something today!
Joel Mueller