views:

660

answers:

6

What function can I put as FOO here to yield true at the end? I played with hash-set (only correct for first 2 values), conj, and concat but I know I'm not handling the single-element vs set condition properly with just any of those.

(defn mergeMatches [propertyMapList]
    "Take a list of maps and merges them combining values into a set"
    (reduce #(merge-with FOO %1 %2) {} propertyMapList))

(def in 
    (list
        {:a 1}
        {:a 2}
        {:a 3}
        {:b 4}
        {:b 5}
        {:b 6} ))

(def out
    { :a #{ 1 2 3}
      :b #{ 4 5 6} })

; this should return true
(= (mergeMatches in) out)

What is the most idiomatic way to handle this?

+8  A: 

This'll do:

(let [set #(if (set? %) % #{%})]
  #(clojure.set/union (set %) (set %2)))

Rewritten more directly for the example (Alex):

(defn to-set [s]
    (if (set? s) s #{s}))
(defn set-union [s1 s2] 
    (clojure.set/union (to-set s1) (to-set s2)))
(defn mergeMatches [propertyMapList]
    (reduce #(merge-with set-union %1 %2) {} propertyMapList))
Chas Emerick
a) due to the restriction of no nested anonymous functions it also isn't valid.b) not too pretty! :)
Alex Miller
Also, I think this will return a set of maps instead of a map with set values.
Alex Miller
@Alex: a) There is no such restriction. You can't nest anonymous function *literals* (created with `#(...)`), but you can nest anonymous functions (although you'll need to use `fn`). b) A matter of taste, I guess. :-) Also, you should try actually running it, as it happens the function it produces does actually make your test expression return `true` when substituted in place of `FOO` in your `reduce` expression.
Michał Marczyk
As an afterthought, note that if the `in` collection contained maps with set values, those values would get flattened in the final product. If that's undesirable, then it's actually better to drop to `loop` / `recur`, as any `reduce` -based solution would be likely to use a terribly convoluted reduction function and require postprocessing of the result map. (The most important reason being that `merge-with` only calls the merger function if the left map doesn't contain the given key.)
Michał Marczyk
@Alex: I certainly wouldn't try to shoehorn my answer into your code as-is -- I'd break it out into a separately-defined fn (named FOO if you really want) that you can refer to in your merge-with form. That's an exercise for you. :-) But no, it does not produce a set of maps -- used as the fn in merge-with, it produces a map with set values. Merge-with will never produce a set.
Chas Emerick
@Michał: Yeah, I meant anonymous function literals. @Chas: I rewrote in terms of the question and added to your answer there.
Alex Miller
I think this is a reasonable answer in the end and certainly takes what I was thinking and extends it to make it work. I'd be curious about whether the wrapping of sets that will later be thrown away has any performance or memory impacts.
Alex Miller
@Alex Making single-element sets like that is too fast to worry about, and they live for such a short time that they'll be very quickly GC'd. If it became a concern after profiling (which I doubt), you could use definline for some perf gain (esp. with to-set), and you could replace set-union with something that explicitly created new sets only when needed, and otherwise conj values or union existing sets.
Chas Emerick
+2  A: 

I wouldn't use merge-with for this,

(defn fnil [f not-found]
  (fn [x y] (f (if (nil? x) not-found x) y)))
(defn conj-in [m map-entry]
  (update-in m [(key map-entry)] (fnil conj #{}) (val map-entry)))
(defn merge-matches [property-map-list]
  (reduce conj-in {} (apply concat property-map-list)))

user=> (merge-matches in)
{:b #{4 5 6}, :a #{1 2 3}}

fnil will be part of core soon so you can ignore the implementation... but it just creates a version of another function that can handle nil arguments. In this case conj will substitute #{} for nil.

So the reduction conjoining to a set for every key/value in the list of maps supplied.

Timothy Pratley
I like this as a solution and that it uses stuff that's moving into the core libs. The solution from @amitrathore is the same basic idea but simpler. I haven't determined yet whether it's actually functionally any different or not.
Alex Miller
So the difference is that the other solution produces a map of lists, not a map of sets which is not as good for me.
Alex Miller
On further usage, the issue with this solution is that it does not deal with the case where the incoming maps already have sets as values. (Yes, I'm adding requirements late. :) For example:user=> (println (mergeMatches (list {:a #{1 2 3}} {:a #{4 5 6}}))){:a #{#{1 2 3} #{4 5 6}}}when I really want {:a #{1 2 3 4 5 6}} as a result. (println (mergeMatches in2))
Alex Miller
+1  A: 

Not super pretty but it works.

(defn mergeMatches [propertyMapList]
    (for [k (set (for [pp propertyMapList] (key (first pp))))]
         {k (set (remove nil? (for [pp propertyMapList] (k pp))))}))
mac
Very clever. I'm a little wary of maintaining code using that many nested list comprehensions!! :) I think some of the other solutions are a bit more readable and understandable.
Alex Miller
Yeah, it becomes a little dense, but I have never understood why it is not used more frequently in Clojure. See http://www.bestinclass.dk/index.php/2010/02/clojure-list-comprehension/ for some more neat examples.
mac
+1  A: 

This seems to work:

(defn FOO [v1 v2]
      (if (set? v1)
          (apply hash-set v2 v1)
          (hash-set v1 v2)))
Siddhartha Reddy
Similar to @Chas but simpler by avoiding the union.
Alex Miller
A: 

I didn't write this but it was contributed by @amitrathore on Twitter:

(defn kv [bag [k v]] 
  (update-in bag [k] conj v))
(defn mergeMatches [propertyMapList]
  (reduce #(reduce kv %1 %2) {} propertyMapList))
Alex Miller
In all, I think this is the simplest AND most understandable solution. Also relies on update-in like @Timothy Pratley's answer. I think the main issue with this one is that it produces a map of lists instead of a map of sets, which doesn't give me the de-duping I need.
Alex Miller
+1  A: 

Another solution contributed by @wmacgyver on Twitter based on multimaps:

(defn add
  "Adds key-value pairs the multimap."
  ([mm k v]
     (assoc mm k (conj (get mm k #{}) v)))
  ([mm k v & kvs]
     (apply add (add mm k v) kvs)))
(defn mm-merge
  "Merges the multimaps, taking the union of values."
  [& mms]
  (apply (partial merge-with union) mms))   

(defn mergeMatches [property-map-list]
  (reduce mm-merge (map #(add {} (key (first %)) (val (first %))) property-map-list)))      
Alex Miller
I like the idea of explicitly building (or using) a multimap library to deal with this data structure and simply relying on it. Depending on how much I end up using it, I might actually go this direction long-term.
Alex Miller