views:

400

answers:

8

Given a sequence of char what is the most efficient way to find the first non repeating char Interested purely functional implementation haskell or F# preffered.

+1  A: 

Here's a bit longish solution, but guaranteed to be worst-case O(n log n):

import List
import Data.Ord.comparing

sortPairs :: Ord a => [(a, b)]->[(a, b)]
sortPairs = sortBy (comparing fst)

index :: Integral b => [a] -> [(a, b)]
index = flip zip [1..]

dropRepeated :: Eq a => [(a, b)]->[(a, b)]
dropRepeated [] = []
dropRepeated [x] = [x]
dropRepeated (x:xs) | fst x == fst (head xs) =
                          dropRepeated $ dropWhile ((==(fst x)).fst) xs
                    | otherwise =
                          x:(dropRepeated xs)

nonRepeatedPairs :: Ord a => Integral b => [a]->[(a, b)]
nonRepeatedPairs = dropRepeated . sortPairs . index

firstNonRepeating :: Ord a => [a]->a
firstNonRepeating = fst . minimumBy (comparing snd) . nonRepeatedPairs

The idea is: sort the string lexicographically, so that it's easy to remove any repeated characters in linear time and find the first character which is not repeated. But in order to find it, we need to save information about characters' positions in text.

The speed on easy cases (like [1..10000]) is not perfect, but for something harder ([1..10000] ++ [1..10000] ++ [10001]) you can see the difference between this and a naive O(n^2).

Of course this can be done in linear time, if the size of alphabet is O(1), but who knows how large the alphabet is...

Michał Trybus
Your `compareBy` is just `Data.Ord.comparing`.
Travis Brown
Great, thanks! I was searching for a function that does something like `\f g a b -> f (g a) (g b)` but I couldn't find one int the library. In this case `comparing` is even more handy. I modified the solution.
Michał Trybus
@Michał: `comparing` works perfectly here but in general the function you were looking for is `Data.Function.on`, usually used in infix style: `f \`on\` g`.
Nefrubyr
+4  A: 

A fairly straightforward use of Data.Set in combination with filter will do the job in an efficient one-liner. Since this seems homeworkish, I'm declining to provide the precise line in question :-)

The complexity should, I think, be O(n log m) where m is the number of distinct characters in the string and n is the total number of characters in the string.

sclv
Are you sure about that complexity, given that you're using an immutable data structure?
IVlad
See the documentation for `Data.Set`: insert is O(log n), as is a membership check. I see from your comments below that you probably have a misunderstanding of immutable data structures. Because updating a tree can preserve sharing, only the path to the mutated node needs to be recreated. Hence update is no more expensive (in big-O terms) than traversal.
sclv
A: 
let firstNonRepeating (str:string) =
    let rec inner i cMap =
        if i = str.Length then
            cMap 
            |> Map.filter (fun c (count, index) -> count = 1) 
            |> Map.toSeq 
            |> Seq.minBy (fun (c, (count, index)) -> index)
            |> fst      
        else
            let c = str.[i]
            let value = if cMap.ContainsKey c then 
                            let (count, index) = cMap.[c]
                            (count + 1, index)
                        else 
                            (1, i)
            let cMap = cMap.Add(c, value)
            inner (i + 1) cMap 
    inner 0 (Map.empty)

Here is a simpler version that sacrifices speed.

let firstNonRepeating (str:string) =
    let (c, count) = str 
                     |> Seq.countBy (fun c -> c) 
                     |> Seq.minBy (fun (c, count) -> count)
    if count = 1 then Some c else None
ChaosPandion
How efficient is this? The map you're using is immutable, and it gets copied to a new map on each recursive call, so isn't this `O(n^2)`?
IVlad
@IVlad - Not particularly efficient, although the F# compiler does a pretty damn good job optimizing. On my slow VM this code will find the character in a 247 character string in about 47us.
ChaosPandion
@IVlad: since the map is immutable you don't need to copy the entire map. The asymptotic bounds for an immutable map are identical to a mutable map.
Niki Yoshiuchi
I couldn't resist a bit of golf on your second version: `str |> Seq.countBy id |> Seq.tryPick (function (c,1) -> Some c | _ -> None)`
cfern
+1  A: 

Here's an F# solution in O(n log n): sort the array, then for each character in the original array, binary search for it in the sorted array: if it's the only one of its kind, that's it.

open System
open System.IO
open System.Collections.Generic

let Solve (str : string) =
    let arrStr = str.ToCharArray()
    let sorted = Array.sort arrStr
    let len = str.Length - 1

    let rec Inner i = 
        if i = len + 1 then
            '-'
        else
            let index = Array.BinarySearch(sorted, arrStr.[i])
            if index = 0 && sorted.[index+1] <> sorted.[index] then 
                arrStr.[i]
            elif index = len && sorted.[index-1] <> sorted.[index] then 
                arrStr.[i]
            elif index > 0 && index < len && 
                 sorted.[index+1] <> sorted.[index] && 
                 sorted.[index-1] <> sorted.[index] then 
                arrStr.[i]
            else
                Inner (i + 1)

    Inner 0

let _ =
    printfn "%c" (Solve "abcdefabcf")

A - means all characters are repeated.

Edit: ugly hack with using the - for "no solution" as you can use Options, which I keep forgetting about! An exercise for the reader, as this does look like homework.

IVlad
Nice work, compared to my 47us your implementation runs in 18us. I can't help but wonder if this can be beaten.
ChaosPandion
I am thinking the tail-call optimizations really makes the difference here. I opened up my implementation and yours in reflector and I see a nice clean while loop in both cases. The copies being made of the Map in my implementation are what puts yours over the top.
ChaosPandion
@ChaosPandion - in theory I don't think it can, but in practice it should be possible to avoid the binary search by keeping the initial index of each element in the sorted array. I have to wonder if this is purely functional however, as arrays are mutable. I don't modify them, but they are still mutable, so does that mean the code is purely functional or not?
IVlad
@IVlad - It is pretty relative, from the clients perspective this code is pure, but underneath the hood it is indeed not. Even if you were to write the code pure from a CIL perspective it probably isn't.
ChaosPandion
A: 

How about something like this:

let firstNonRepeat s =
  let repeats = 
    ((Set.empty, Set.empty), s)
    ||> Seq.fold (fun (one,many) c -> Set.add c one, if Set.contains c one then Set.add c many else many)
    |> snd
  s
  |> Seq.tryFind (fun c -> not (Set.contains c repeats))
kvb
I think you reversed it, they are looking for the first non-repeating character.
ChaosPandion
@ChaosPandion - oops, good point. Fixed.
kvb
A: 

This is pure C# (so I assume there's a similar F# version), which will be efficient if GroupBy is efficient (which it ought to be):

static char FstNonRepeatedChar(string s)
{
    return s.GroupBy(x => x).Where(xs => xs.Count() == 1).First().First();
}
Rafe
+1  A: 

An alternate Haskell O(n log n) solution using Data.Map and no sorting:

module NonRepeat (
    firstNonRepeat
    )
    where

import Data.List (minimumBy)
import Data.Map (fromListWith, toList)
import Data.Ord (comparing)

data Occurance = Occ { first :: Int, count :: Int }
    deriving (Eq, Ord)

note :: Int -> a -> (a, Occurance)
note pos a = (a, Occ pos 1)

combine :: Occurance -> Occurance -> Occurance
combine (Occ p0 c0) (Occ p1 c1) = Occ (p0 `min` p1) (c0 + c1)

firstNonRepeat :: (Ord a) => [a] -> Maybe a
firstNonRepeat = fmap fst . findMinimum . occurances
    where occurances = toList . fromListWith combine . zipWith note [0..]
          findMinimum = safeMinimum . filter ((== 1).count.snd)
          safeMinimum [] = Nothing
          safeMinimum xs = Just $ minimumBy (comparing snd) xs
MtnViewMark
+1  A: 

A simple F# solution:

let f (s: string) =
  let n = Map(Seq.countBy id s)
  Seq.find (fun c -> n.[c] = 1) s
Jon Harrop