views:

187

answers:

3

What I would like to do (in Clojure):

For example, I have a vector of words that need to be removed:

(def forbidden-words [":)" "the" "." "," " " ...many more...])

... and a vector of strings:

(def strings ["the movie list" "this.is.a.string" "haha :)" ...many more...])

So, each forbidden word should be removed from each string, and the result, in this case, would be: ["movie list" "thisisastring" "haha"].

How to do this ?

+1  A: 
(use 'clojure.contrib.str-utils)
(import 'java.util.regex.Pattern)
(def forbidden-words [":)" "the" "." "," " "])
(def strings ["the movie list" "this.is.a.string" "haha :)"])
(def regexes (map #(Pattern/compile % Pattern/LITERAL) forbidden-words))
(for [s strings] (reduce #(re-gsub %2 "" %1) s regexes))
Jouni K. Seppänen
+1, since this works. For those who'd like to test this with on the bleeding edge, note that `clojure.contrib.str-utils` has been renamed to `clojure.contrib.string` in the current sources and `re-gsub` has become `replace-re`. Also note that if removing a word from between two other words should entail removing exactly one of the spaces surrounding it (rather than none, as with the code above) *and* words at the beginning and end of the string were to be handled correctly, then somewhat more involved regex magic would be called for.
Michał Marczyk
Your call to `Pattern/compile` can be replaced with `re-pattern`.
Brian Carper
@Brian: `re-pattern` doesn't accept the `Pattern/LITERAL` argument which is necessary here.
Michał Marczyk
Ah, right. Never mind.
Brian Carper
All multipass answers are faulty, try your solution with the input ["th:)e"].
cgrand
A: 

Using function composition and the -> macro this can be nice and simple:

(for [s strings] 
  (-> s ((apply comp 
           (for [s forbidden-words] #(.replace %1 s ""))))))

If you want to be more 'idiomatic', you can use replace-str from clojure.contrib.string, instead of #(.replace %1 s "").

No need to use regexs here.

Michiel Borkent
All multipass answers are inherently broken:(def forbidden-words [":)" "the" "." ","])(for [s [":the)"]] (-> s ((apply comp (for [s forbidden-words] #(.replace %1 s ""))))));; this returns ("")
cgrand
+6  A: 
(def forbidden-words [":)" "the" "." ","])
(def strings ["the movie list" "this.is.a.string" "haha :)"])
(let [pattern (->> forbidden-words (map #(java.util.regex.Pattern/quote %)) 
                (interpose \|)  (apply str))]
  (map #(.replaceAll % pattern "") strings))
cgrand
I like this better because it only does a single pass over the input string.
Stuart Sierra
Regarding your comment below, have you tried out your own answer with ["th:)e"] ? It doesn't work correctly when I try it.
A. Levy
@ALevy To me, he works as expected: for ["th:)e" ":the)"] it outputs ("the" ":)") removing only the forbidden words that appear in the input string -- and not forbidden words that appear when you have already removed sother forbidden words. My solution is the only one whose return values don't depend on the ordering of the forbidden-words vector.
cgrand
I like this solution the most because it does not use loops and it's fast.
Zeljko