import qualified Data.Map as Map
import Data.List (foldl') -- ' (to fix SO syntax highlighting)
histogram :: (Ord a) => [a] -> Map.Map a Int
histogram = foldl' (\m x -> Map.insertWith' (+) x 1 m) Map.empty
The explanation for why this works and why it is superior to Travis Brown's solution is pretty technical, and will require some patience to understand fully.
If there are only finitely many values that can possibly occur in the list, then this runs in constant memory. Travis's solution has a subtle bug in which the resulting map entries will look like:
(4, 1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1)
A very inefficient representation of the number 19. Only when you ask for that element in the map will the giant sum be computed. These "thunks" (delayed-evaluation expressions) will grow linearly with the size of the input.
To prevent this, we use insertWith'
, which applies the function strictly, that is to say it evaluates the result before it puts it in the map. So then if you insert 4 into the map above, it will evaluate the thunk and you will get a nice tidy:
(4, 20)
And another will evaluate that before adding so you will get:
(4, 21)
So now at least the values of the map are constant space.
The final thing we need to do is to change the right fold to a left fold because Map.insert is strict in its second argument. The following demonstrates the meaning of a right fold.
iw x m = Map.insertWith' (+) x 1 m -- '
foldr iw Map.empty [1,2,1,3,2,1]
= iw 1 (iw 2 (iw 1 (iw 3 (iw 2 (iw 1 Map.empty)))))
Using iw
as a simple shorthand. Map.insert
being strict in its second argument means you need to evaluate the map into which you are inserting before insert can do any work. I will use the notation { k1 -> v1, k2 -> v2, ... }
as a shorthand for maps. Your sequence of evaluation looks like this:
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
foldr iw {} [1,2,1,3,2,1]
iw 1 (foldr iw {} [2,1,3,2,1])
iw 1 (iw 2 (foldr iw {} [1,3,2,1]))
iw 1 (iw 2 (iw 1 (foldr iw {} [3,2,1])))
iw 1 (iw 2 (iw 1 (iw 3 (foldr iw {} [2,1]))))
iw 1 (iw 2 (iw 1 (iw 3 (iw 2 (foldr iw {} [1])))))
iw 1 (iw 2 (iw 1 (iw 3 (iw 2 (iw 1 (foldr iw {} []))))))
iw 1 (iw 2 (iw 1 (iw 3 (iw 2 (iw 1 {}))))))
iw 1 (iw 2 (iw 1 (iw 3 (iw 2 {1 -> 1}))))
iw 1 (iw 2 (iw 1 (iw 3 {1 -> 1, 2 -> 1})))
iw 1 (iw 2 (iw 1 {1 -> 1, 2 -> 1, 3 -> 1}))
iw 1 (iw 2 {1 -> 2, 2 -> 1, 3 -> 1})
iw 1 {1 -> 2, 2 -> 2, 3 -> 1}
{1 -> 3, 2 -> 2, 3 -> 1}
So if you have a 1,000,000 element array, we have to go all the way down to the 1,000,000th element to start inserting, thus we need to keep the previous 999,999 elements in memory so we can know what to do next. A left fold solves this:
-- definition of left fold
foldl' f z xs = go z xs -- '
where
go accum [] = z
go accum (x:xs) = accum `seq` go (f accum x) xs
foldl' (flip iw) Map.empty [1,2,1,3,2,1] -- needed to flip arg order to appease foldl'
go {} [1,2,1,3,2,1]
go (iw 1 {}) [2,1,3,2,1]
go (iw 2 {1 -> 1}) [1,3,2,1]
go (iw 1 {1 -> 1, 2 -> 1}) [3,2,1]
go (iw 3 {1 -> 2, 2 -> 1}) [2,1]
go (iw 2 {1 -> 2, 2 -> 1, 3 -> 1}) [1]
go (iw 1 {1 -> 2, 2 -> 2, 3 -> 1}) []
iw 1 {1 -> 2, 2 -> 2, 3 -> 1}
{1 -> 3, 2 -> 2, 3 -> 1}
Now we can see that, finally, if the number of entries in the map is bounded, then this runs in constant space and linear time.