views:

134

answers:

4

Hi,

I'm new to F# and looking for a function which take N*indexes and a sequence and gives me N elements. If I have N indexes it should be equal to concat Seq.nth index0, Seq.nth index1 .. Seq.nth indexN but it should only scan over indexN elements (O(N)) in the sequence and not index0+index1+...+indexN (O(N^2)).

To sum up, I'm looking for something like:

//For performance, the index-list should be ordered on input, be padding between elements instead of indexes or be ordered when entering the function
seq {10 .. 20} |> Seq.takeIndexes [0;5;10] 
Result: 10,15,20

I could make this by using seq { yield... } and have a index-counter to tick when some element should be passed out but if F# offers a nice standard way I would rather use that.

Thanks :)...

Addition: I have made the following. It works but ain't pretty. Suggestions is welcomed

let seqTakeIndexes (indexes : int list) (xs : seq<int>) =
    seq {
        //Assume indexes is sorted
        let e = xs.GetEnumerator()
        let i = ref indexes 
        let curr = ref 0

        while e.MoveNext() && not (!i).IsEmpty do
            if !curr = List.head !i then
                i := (!i).Tail
                yield e.Current

            curr := !curr + 1
    }
+7  A: 

When you want to access elements by index, then using sequences isn't as good idea. Sequences are designed to allow sequential iteration. I would convert the necessary part of the sequence to an array and then pick the elements by index:

let takeIndexes ns input = 
  // Take only elements that we need to access (sequence could be infinite)
  let arr = input |> Seq.take (1 + Seq.max ns) |> Array.ofSeq
  // Simply pick elements at the specified indices from the array
  seq { for index in ns -> arr.[index] }

seq [10 .. 20] |> takeIndexes [0;5;10]  

Regarding your implementation - I don't think it can be made significantly more elegant. This is a general problem when implementing functions that need to take values from multiple sources in an interleaved fashion - there is just no elegant way of writing those!

However, you can write this in a functional way using recursion like this:

let takeIndexes indices (xs:seq<int>) = 
  // Iterates over the list of indices recursively
  let rec loop (xe:IEnumerator<_>) idx indices = seq {
    let next = loop xe (idx + 1)
    // If the sequence ends, then end as well
    if xe.MoveNext() then
      match indices with
      | i::indices when idx = i -> 
        // We're passing the specified index 
        yield xe.Current
        yield! next indices
      | _ -> 
        // Keep waiting for the first index from the list
        yield! next indices }
  seq {
    // Note: 'use' guarantees proper disposal of the source sequence
    use xe = xs.GetEnumerator()
    yield! loop xe 0 indices }

seq [10 .. 20] |> takeIndexes [0;5;10]  
Tomas Petricek
+1: helpful as always :)
Juliet
+1 Thanks :) In this case, converting to an array is overkill. I need index 0,9,99,999,9999,99999,999999. In this case my own method is faster than the array-converting and significantly faster than the second approach. But it is nice to see a functional solution to it, and nice to learn something new. Ex 'use'
lasseespeholt
In my code and your code seq [1 .. 3] can be simplified to {1 .. 3}
lasseespeholt
@lasseespeholt: Yes the use of `seq` is not necessary. I would either write `[ 1 .. 3 ]` (which creates a list) or `seq { 1 .. 3 }` which is a sequence expression. According to the specificaion: http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/spec-1.9.7.pdf (page 58), the use of `seq` is a convention when writing sequence expressions of the form `seq { ... }`. (See also pg. 59 with an example of range). This convention makes sure that "sequence expressions" are understood as a special case of F# computation expressions...
Tomas Petricek
... other two usual notations are `[ ... ]` when creating a sequence that should be converted to F# list and `[| ... |]` when creating a sequence that should be converted to an array.
Tomas Petricek
Educating document (reading Programming F# by Chris Smith), thanks :)
lasseespeholt
Also, pick up Real World Functional Programming, well-written and I like the C# comparisons. Great job Tomas et al
Scot Hauder
+2  A: 

When you need to scan a sequence and accumulate results in O(n), you can always fall back to Seq.fold:

let takeIndices ind sq =
    let selector (idxLeft, currIdx, results) elem =
        match idxLeft with
            | []                               -> (idxLeft, currIdx, results)
            | idx::moreIdx when idx =  currIdx -> (moreIdx, currIdx+1, elem::results)
            | idx::_       when idx <> currIdx -> (idxLeft, currIdx+1, results)
            | idx::_                           -> invalidOp "Can't get here."
    let (_, _, results) = sq |> Seq.fold selector (ind, 0, [])
    results |> List.rev

seq [10 .. 20] |> takeIndices [0;5;10]

The drawback of this solution is that it will enumerate the sequence to the end, even if it has accumulated all the desired elements already.

Mau
1+ Ahh okay, I will try it if I should ever need it :) But in this case I want it should work with infinite sequences. I apologize for not clarify that in the problem.
lasseespeholt
+1  A: 

Here is my shot at this. This solution will only go as far as it needs into the sequence and returns the elements as a list.

let getIndices xs (s:_ seq) =
    let enum = s.GetEnumerator()
    let rec loop i acc = function
        | h::t as xs ->
            if enum.MoveNext() then
                if i = h then
                    loop (i+1) (enum.Current::acc) t
                else
                    loop (i+1) acc xs
            else
                raise (System.IndexOutOfRangeException())
        | _ -> List.rev acc
    loop 0 [] xs

[10..20]
|> getIndices [2;4;8]
// Returns [12;14;18]

The only assumption made here is that the index list you supply is sorted. The function won't work properly otherwise.

YotaXP
1+ Cool, in my case I think returning a list would give better performance because I only need a few elements in a LOT of elements (but in this case the difference don't matter). However, I initially thought it made more sense to return a sequence because I work with sequences.
lasseespeholt
A: 

Is it a problem, that the returned result is sorted? This algorithm will work linearly over the input sequence. Just the indices need to be sorted. If the sequence is large, but indices are not so many - it'll be fast. Complexity is: N -> Max(indices), M -> count of indices: O(N + MlogM) in the worst case.

let seqTakeIndices indexes = 
    let rec gather prev idxs xs =
        match idxs with
        | [] -> Seq.empty
        | n::ns ->  seq { let left = xs |> Seq.skip (n - prev)
                          yield left |> Seq.head
                          yield! gather n ns left }
    indexes |> List.sort |> gather 0

Here is a List.fold variant, but is more complex to read. I prefer the first:

let seqTakeIndices indices xs = 
    let gather (prev, xs, res) n =
        let left = xs |> Seq.skip (n - prev)
        n, left, (Seq.head left)::res
    let _, _, res = indices |> List.sort |> List.fold gather (0, xs, [])
    res

Appended: Still slower than your variant, but a lot faster than mine older variants. Because of not using Seq.skip that is creating new enumerators and was slowing down things a lot.

let seqTakeIndices indices (xs : seq<_>) = 
    let enum = xs.GetEnumerator()
    enum.MoveNext() |> ignore
    let rec gather prev idxs =  
        match idxs with
        | [] -> Seq.empty
        | n::ns -> seq { if [1..n-prev] |> List.forall (fun _ -> enum.MoveNext()) then 
                            yield enum.Current
                            yield! gather n ns }
    indices |> List.sort |> gather 0
The_Ghost
Hhm I made a solution which used Seq.skip etc. but it gave me very bad performance compared to what I have now. I guess you make another enumerator every time you assign left?
lasseespeholt
It is interesting that Seq.skip is a slow function... I thought that those are internally optimized.
The_Ghost
Here you could find some answers:http://stackoverflow.com/questions/1306140/f-why-is-using-a-sequence-so-much-slower-than-using-a-list-in-this-example
The_Ghost