views:

1070

answers:

4

I need to be able to write a function that shows repeated words from a string and return a list of strings in order of its occurrence and ignore non-letters

e.g at hugs prompt

repetitions :: String -> [String]

repetitions > "My bag is is action packed packed."
output> ["is","packed"]
repetitions > "My name  name name is Sean ."
output> ["name","name"]
repetitions > "Ade is into into technical drawing drawing ."
output> ["into","drawing"]
+7  A: 

To split a string into words, use the words function (in the Prelude). To eliminate non-word characters, filter with Data.Char.isAlphaNum. Zip the list together with its tail to get adjacent pairs (x, y). Fold the list, consing a new list that contains all x where x == y.

Someting like:

repetitions s = map fst . filter (uncurry (==)) . zip l $ tail l
  where l = map (filter isAlphaNum) (words s)

I'm not sure that works, but it should give you a rough idea.

Apocalisp
this definately worked but i discovered i haven't been taught map and dupe function.Could this question be answered without those two functions? I'm practicing for my exams its not a homework.
SeanHill
"dupe" is just an auxiliary function here, it just checks if two values are the same. However, "map" is maybe the single most important function in Haskell or any functional programming, so it's better idea just to learn about it for your exam.
mattiast
@SeanHill: If you haven't been taught the map function then something is deeply wrong...
Norman Ramsey
Hmmn.., possibly. I was taught hash map in Data Structures it seem to be the most effective means for performing searching. I reckon we are needed to deeply understand text manipulation without taking the short route.
SeanHill
The `map` function has nothing to do with hash maps. It is a function that "maps" another function over a list. It's fundamental, and you will want to know it well, because you will need it often.
Apocalisp
didn't the original poster ask to ignore non-letters? The given solution allows numbers
Phyx
+2  A: 

I am new to this language so my solution could be a kind of ugly in the eyes of an Haskell veteran, but anyway:

let repetitions x = concat (map tail (filter (\x -> (length x) > 1) (List.group (words (filter (\c -> (c >= 'a' && c <= 'z') || (c>='A' && c <= 'Z') ||  c==' ') x)))))

This part will remove all non letters and non spaces from a string s:

filter (\c -> (c >= 'a' && c <= 'z') || (c>='A' && c <= 'Z') ||  c==' ') s

This one will split a string s to words and group the same words to lists returning list of lists:

List.group (words s)

When this part will remove all lists with less than two elements:

filter (\x -> (length x) > 1) s

After what we will concatenate all lists to one removing one element from them though

concat (map tail s)
Alexander Prokofyev
`concat . map` exists as a single function called `(>>=)`
Apocalisp
A: 

This might be inelegent, however it is conceptually very simple. I'm assuming that its looking for consecutive duplicate words like the examples.

-- a wrapper that allows you to give the input as a String
repititions :: String -> [String]
repititions s = repititionsLogic (words s)
-- dose the real work 
repititionsLogic :: [String] -> [String]
repititionsLogic [] = []
repititionsLogic [a] = []
repititionsLogic (a:as) 
    | ((==) a (head as)) = a : repititionsLogic as
    | otherwise = repititionsLogic as
A: 

Building on what Alexander Prokofyev answered:

repetitions x = concat (map tail (filter (\x -> (length x) > 1) (List.group (word (filter (\c -> (c >= 'a' && c <= 'z') || (c>='A' && c <= 'Z') || c==' ') x)))))

Remove unnecessary parenthesis:

repetitions x = concat (map tail (filter (\x -> length x > 1) (List.group (word (filter (\c -> c >= 'a' && c <= 'z' || c>='A' && c <= 'Z' || c==' ') x)))))

Use $ to remove more parenthesis (each $ can replace an opening parenthesis if the ending parenthesis is at the end of the expression):

repetitions x = concat $ map tail $ filter (\x -> length x > 1) $ List.group $ word $ filter (\c -> c >= 'a' && c <= 'z' || c>='A' && c <= 'Z' || c==' ') x

Replace character ranges with functions from Data.Char, merge concat and map:

repetitions x = concatMap tail $ filter (\x -> length x > 1) $ List.group $ word $ filter (\c -> isAlpha c || isSeparator c) x

Use a section and currying in points-free style to simplify (\x -> length x > 1) to ((>1) . length). This combines length with (>1) (a partially applied operator, or section) in a right-to-left pipeline.

repetitions x = concatMap tail $ filter ((>1) . length) $ List.group $ word $ filter (\c -> isAlpha c || isSeparator c) x

Eliminate explicit "x" variable to make overall expression points-free:

repetitions = concatMap tail . filter ((>1) . length) . List.group . word . filter (\c -> isAlpha c || isSeparator c)

Now the entire function, reading from right to left, is a pipeline that filters only alpha or separator characters, splits it into words, breaks it into groups, filters those groups with more than 1 element, and then reduces the remaining groups to the first element of each.

Chris Smith