views:

458

answers:

5

I'm trying to write a string processing function in F#, which looks like this:

let rec Process html =
  match html with
  | '-' :: '-' :: '>' :: tail -> ("→" |> List.of_seq) @ Process tail
  | head :: tail -> head :: Process tail
  | [] -> []

My pattern matching expression against several elements is a bit ugly (the whole '-' :: '-' :: '>' thing). Is there any way to make it better? Also, is what I'm doing efficient if I were to process large texts? Or is there another way?

Clarification: what I mean is, e.g., being able to write something like this:

match html with
| "-->" :: tail ->
+1  A: 

I think you should avoid using list<char> and using strings and e.g. String.Replace, String.Contains, etc. System.String and System.StringBuilder will be much better for manipulating text than list<char>.

Brian
+2  A: 

For simple problems, using String and StringBuilder directly as Brian mentioned is probably the best way. For more complicated problems, you may want to check out some sophisticated parsing library like FParsec for F#.

Tomas Petricek
A: 

This question may be some help to give you ideas for another way of approaching your problem - using list<> to contain lines, but using String functions within each line.

Benjol
+4  A: 

I agree with others that using a list of characters for doing serious string manipulation is probably not ideal. However, if you'd like to continue to use this approach, one way to get something close to what you're asking for is to define an active pattern. For instance:

let rec (|Prefix|_|) s l =
  if s = "" then
    Some(Prefix l)
  else
    match l with
    | c::(Prefix (s.Substring(1)) xs) when c = s.[0] -> Some(Prefix xs)
    | _ -> None

Then you can use it like:

let rec Process html =  
  match html with  
  | Prefix "-->" tail -> ("&rarr;" |> List.of_seq) @ Process tail  
  | head :: tail -> head :: Process tail  
  | [] -> []
kvb
Your snippets contain F# that I don't actually understand :) Back to the book for me, then! Thanks!
Dmitri Nesteruk
+3  A: 

Is there any way to make it better?

Sure:

let process (s: string) = s.Replace("-->", "&rarr;")

Also, is what I'm doing efficient if I were to process large texts?

No, it is incredibly inefficient. Allocation and garbage collection is expensive and you're doing so for every single character.

Or is there another way?

Try the Replace member. If that doesn't work, try a regular expression. If that doesn't work, write a lexer (e.g. using fslex). Ultimately, what you want for efficiency is a state machine processing a stream of chars and outputting its result by mutating in-place.

Jon Harrop
Well I figured out the hard way that doing this is inefficient. However, doing it in C# using System.String and whatnot also ended up being tedious. In the end, I wrote a DSL that generated some truly evil C++ code to solve this.
Dmitri Nesteruk
Yes, that's what real men do. ;-)
Jon Harrop