views:

97

answers:

1

I have a list of words and a list of associated part of speech tags. I want to iterate over both, simultaneously (matched index) using each indexed tuple as input to a .NET function. Is this the best way (it works, but doesn't feel natural to me):

let taggingModel = SeqLabeler.loadModel(lthPath + 
                      "models\penn_00_18_split_dict.model");
let lemmatizer = new Lemmatizer(lthPath + "v_n_a.txt")
let input = "the rain in spain falls on the plain"

let words = Preprocessor.tokenizeSentence( input )
let tags = SeqLabeler.tagSentence( taggingModel, words )
let lemmas = Array.map2 (fun x y -> lemmatizer.lookup(x,y)) words tags
+7  A: 

Your code looks quite good to me - most of it deals with some loading and initialization, so there isn't much you could do to simplify that part. Alternatively to Array.map2, you could use Seq.zip combined with Seq.map - the zip function combines two sequences into a single one that contains pairs of elements with matching indices:

let lemmas = Seq.zip words tags 
          |> Seq.map (fun (x, y) -> lemmatizer.lookup (x, y)) 

Since lookup function takes a tuple that you got as an argument, you could write:

// standard syntax using the pipelining operator
let lemmas = Seq.zip words tags |> Seq.map lemmatizer.lookup

// .. an alternative syntax doing exactly the same thing
let lemmas = (words, tags) ||> Seq.zip |> Seq.map lemmatizer.lookup

The ||> operator used in the second version takes a tuple containing two values and passes them to the function on the right side as two arguments, meaning that (a, b) ||> f means f a b. The |> operator takes only a single value on the left, so (a, b) |> f would mean f (a, b) (which would work if the function f expected tuple instead of two, space separated, parameters).

If you need lemmas to be an array at the end, you'll need to add Array.ofSeq to the end of the processing pipeline (all Seq functions work with sequences, which correspond to IEnumerable<T>)

One more alternative is to use sequence expressions (you can use [| .. |] to construct an array directly if that's what you need):

let lemmas = [| for wt in Seq.zip words tags do // wt is tuple (string * string)
                  yield lemmatizer.lookup wt |] 

Whether to use sequence expressions or not - that's just a personal preference. The first option seems to be more succinct in this case, but sequence expressions may be more readable for people less familiar with things like partial function application (in the shorter version using Seq.map)

Tomas Petricek
+1 for ||> operator!
Yin Zhu
awesome. why is ||> necessary? why doesn't |> work?
I added some explanation regarding `||>` - briefly - it allows you to pass two parameters to the function on the right, while `|>` specifies only one parameter (`Seq.zip` takes two parameters).
Tomas Petricek