views:

414

answers:

6

So basically, if I have a (finite or infinite) list of (finite or infinite) lists of strings, is it possible to sort the list by length first and then by lexicographic order, excluding duplicates? A sample input/output would be:

Input:

[["a", "b",...], ["a", "aa", "aaa"], ["b", "bb", "bbb",...], ...]

Output:

["a", "b", "aa", "bb", "aaa", "bbb", ...]

I know that the input list is not a valid haskell expression but suppose that there is an input like that. I tried using merge algorithm but it tends to hang on the inputs that I give it. Can somebody explain and show a decent sorting function that can do this? If there isn't any function like that, can you explain why?

In case somebody didn't understand what I meant by the sorting order, I meant that shortest length strings are sorted first AND if one or more strings are of same length then they are sorted using < operator.

Thanks!

+3  A: 

Well, I'm going to ignore your request for sorting infinite data.

To sort by length of the sublists, then by lexicographic order, we can do this pretty easily. Oh, and you want duplicates removed.

We'll start with a sample:

> s
[["a","b"],["a","aa","aaa"],["b","bb","bbb"]]

And then build the program incrementally.

First sorting on length (using Data.Ord.comparing to build the sort body):

> sortBy (comparing length) s
[["a","b"],["a","aa","aaa"],["b","bb","bbb"]]

Ok. That looks reasonable. So let's just concat, and sortBy length then alpha:

> sortBy (comparing length) . nub . concat $ s
["a","b","aa","bb","aaa","bbb"]

If your input is sorted. Otherwise you'll need a sligtly different body to sortBy.

Don Stewart
+5  A: 
  • In general it is impossible to sort infinite lists. Because the smallest item could be at infinite position and we must find it before we output it.

  • Merging infinite sorted lists is possible.

  • In general, merging an infinite list of sorted lists is impossible. For same reason that sorting them is.

  • Merging an infinite list of sorted lists, which are sorted by heads (forall i j. i < j => head (lists !! i) <= head (lists !! j)), is possible.

So I'm guessing that what you really want is to merge a sorted infinite list of sorted lists. It's even a task that makes some sense. There's even some existing code that uses it, implemented for monadic lists there - kinda ugly syntax-wise etc. So here's a version for plain lists:

mergeOnSortedHeads :: Ord b => (a -> b) -> [[a]] -> [a]
mergeOnSortedHeads _ [] = []
mergeOnSortedHeads f ([]:xs) = mergeOnSortedHeads f xs
mergeOnSortedHeads f ((x:xs):ys) =
  x : mergeOnSortedHeads f (bury xs ys)
  where
    bury [] ks = ks
    bury js [] = [js]
    bury js ([]:ks) = bury js ks
    bury jj@(j:js) ll@(kk@(k:ks):ls)
      | f j <= f k = jj : ll
      | otherwise = kk : bury jj ls

ghci> take 20 $ mergeOnSortedHeads id $ [[0,4,6], [2,3,9], [3,5..], [8]] ++ map repeat [12..]
[0,2,3,3,4,5,6,7,8,9,9,11,12,12,12,12,12,12,12,12]

btw: what do you need this for?

yairchu
+8  A: 

Ultimately, you can't sort an infinite list, because items at the tail of the list could percolate all the way to the front of the result, so you can't finish sorting an infinite list until you've seen the last item, but your list is infinite, so you'll never get there.

The only way that you could even try to sort an infinite list would require constraints on the inhabitants of the list. If the values of the list items comes from a well-founded set and the contents of the list are unique then you could at least make some progress in returning elements the initial elements of the list. For instance if the list was of distinct natural numbers, you could return the first 0 you see, then the first 1, etc. but you couldn't make any headway in the result until you saw 2, no matter how far down the list you went. Ultimately, if you ever skipped an element in the set because it wasn't present in the source, you'd cease to produce new output elements until you had the entire input in hand.

You can do the same thing with strings, because they are well founded, but that is only even remotely viable if you plan on returning all possible strings.

In short, if you need this, you're going about solving the problem you have in the wrong way. This isn't a tractable path to any solution you will want to use.

As yairchu noted, merging a finite number of sorted infinite lists works fine though.

Edward Kmett
A: 

Whether it can be done depends very much on the nature of your input data. If you can 'stop looking' for lists of a certain length when you've seen a longer one and there are only a finite number of lists of each length, then you can go through the lengths in ascending order, sort those and concatenate the results. Something like this should work:

listsUptoLength n xss = takeWhile (\xs -> length xs <= n) $ xss 
listsUptoLength' n [] = []
listsUptoLength' n (xss:xsss) = case listsUptoLength n xss of
    [] -> []
    xss' -> xss' : listsUptoLength' n xsss
listsOfLength n xsss = concatMap (\xss -> (filter (\xs -> length xs == n) xss)) (listsUptoLength' n xsss) 

sortInfinite xsss = concatMap (\n -> sort . nub $ (listsOfLength n xsss)) [0..] 

f xs y = [xs ++ replicate n y | n <- [1..]]
test = [ map (\x -> [x]) ['a'..'e'], f "" 'a', f "" 'b', f "b" 'a', f "a" 'b' ] ++ [f start 'c' | start <- f "" 'a'] 

(The names could probably be chosen to be more illuminating :)

I'm guessing you're working with regular expressions, so I think something like this could be made to work; I repeat the request for more background!

yatima2975
+1  A: 

Thanks to everyone for their inputs and sorry for the late reply. Turns out I was just approaching the problem in a wrong way. I was trying to do what Yairchu showed but I was using the built in function length to do the merging but length doesnt work on an infinite list for obvious reasons. Anyways, I solved my problem by sorting as I created the list on the go, not on the end result. I wonder what other languages offer infinite lists? Such a weird but useful concept.

eclipseNoob
A: 

Here is an algorithm that let you online sort:

it is not efficient, but it is lazy enough to let you goto different sort generations, even if you sort infinite lists. It is a nice gimmick, but not very usable. For example sorting the infinite list [10,9..]:

*Main> take 10 $ sortingStream [10,9..] !! 0
[9,8,7,6,5,4,3,2,1,0]
*Main> take 10 $ sortingStream [10,9..] !! 1
[8,7,6,5,4,3,2,1,0,-1]
*Main> take 10 $ sortingStream [10,9..] !! 2
[7,6,5,4,3,2,1,0,-1,-2]
*Main> take 10 $ sortingStream [10,9..] !! 3
[6,5,4,3,2,1,0,-1,-2,-3]
*Main> take 10 $ sortingStream [10,9..] !! 4
[5,4,3,2,1,0,-1,-2,-3,-4]
*Main> take 10 $ sortingStream [10,9..] !! 1000
[-991,-992,-993,-994,-995,-996,-997,-998,-999,-1000]

As you can see the sorting improves each generation. The code:

produce :: ([a] -> [a]) -> [a] -> [[a]]
produce f xs = f xs : (produce f (f xs))


sortingStream :: (Ord a) => [a] -> [[a]]
sortingStream = produce ss

ss :: (Ord a) => [a] -> [a]
ss [] = []
ss [x] = [x]
ss [x,y]    | x <= y = [x,y]
            | otherwise = [y,x]
ss (x:y:xs) | x <= y  =  x: (ss (y:xs))
            | otherwise =  y:(ss (x:xs))
Edgar Klerks