ansaurus

Question

Newbie transforming CSV files in Clojure

Answer 1

+7 A:

Here's one way:

(use '(clojure.contrib duck-streams str-utils))                 ;;'
(with-out-writer "coords.txt"
  (doseq [line (read-lines "coords.csv")]
    (let [[x y z p] (re-split #"," line)]
      (println (str-join \space [p x y z])))))

with-out-writer binds *out* such that everything you print will go to the filename or stream you specify, rather than standard-output.

Using def as you're using it isn't idiomatic. A better way is to use let. I'm using destructuring to assign the 4 fields of each line to 4 let-bound names; then you can do what you want with those.

If you're iterating over something for the purpose of side-effects (e.g. I/O) you should usually go for doseq. If you wanted to collect up each line into a hash-map and do something with them later, you could use for:

(with-out-writer "coords.txt"
  (for [line (read-lines "coords.csv")]
    (let [fields (re-split #"," line)]
      (zipmap [:x :y :z :p] fields))))

Brian Carper 2009-11-17 07:56:45

Exactly what I needed! And elegantly done as well! doseq didn't make much sense to me until now, and I can see now that I misunderstood a few other things as well. I tried your code in ClojureBox and it worked; I was also able to wrap it up in a function and that worked as well, so this seems to have put me on the right track. Thanks again.

Bill_B 2009-11-18 01:45:49

Answer 2

+6 A:

Please don't use nested def's. It doesn't do, what you think it does. def is always global! For locals use let instead. While the library functions are nice to know, here a version orchestrating some features of functional programming in general and clojure in particular.

(import 'java.io.FileWriter 'java.io.FileReader 'java.io.BufferedReader)

(defn translate-coords

Docstrings can be queried in the REPL via (doc translate-coords). Works eg. for all core functions. So supplying one is a good idea.

  "Reads coordinates from infile, translates them with the given
  translator and writes the result to outfile."

translator is a (maybe anonymous) function which extracts the translation from the surrounding boilerplate. So we can reuse this functions with different transformation rules. The type hints here avoid reflection for the constructors.

  [translator #^String infile #^String outfile]

Open the files. with-open will take care, that the files are closed when its body is left. Be it via normal "drop off the bottom" or be it via a thrown Exception.

  (with-open [in  (BufferedReader. (FileReader. infile))
              out (FileWriter. outfile)]

We bind the *out* stream temporarily to the output file. So any print inside the binding will print to the file.

    (binding [*out* out]

The map means: take the seq and apply the given function to every element and return the seq of the results. The #() is a short-hand notation for an anonymous function. It takes one argument, which is filled in at the %. The doseq is basically a loop over the input. Since we do that for the side effects (namely printing to a file), doseq is the right construct. Rule of thumb: map: lazy => for result, doseq: eager => for side effects.

      (doseq [coords (map #(.split % ",") (line-seq in))]

println takes care for the \n at the end of the line. interpose takes the seq and adds the first argument (in our case " ") between its elements. (apply str [1 2 3]) is equivalent to (str 1 2 3) and is useful to construct function calls dynamically. The ->> is a relatively new macro in clojure, which helps a bit with readability. It means "take the first argument and add it as last item to the function call". The given ->> is equivalent to: (println (apply str (interpose " " (translator coords)))). (Edit: Another note: since the separator is \space, we could here write just as well (apply println (translator coords)), but the interpose version allows to also parametrize the separator as we did with the translator function, while the short version would hardwire \space.)

        (->> (translator coords)
          (interpose " ")
          (apply str)
          println)))))

(defn survey->cartography-format
  "Translate coords in survey format to cartography format."

Here we use destructuring (note the double [[]]). It means the argument to the function is something which can be turned into a seq, eg. a vector or a list. Bind the first element to y, the second to x and so on.

  [[y x z p]]
  [p x y z])

(translate-coords survey->cartography-format "survey_coords.txt" "cartography_coords.txt")

Here again less choppy:

(import 'java.io.FileWriter 'java.io.FileReader 'java.io.BufferedReader)

(defn translate-coords
  "Reads coordinates from infile, translates them with the given
  translator and writes the result to outfile."
  [translator #^String infile #^String outfile]
  (with-open [in  (BufferedReader. (FileReader. infile))
              out (FileWriter. outfile)]
    (binding [*out* out]
      (doseq [coords (map #(.split % ",") (line-seq in))]
        (->> (translator coords)
          (interpose " ")
          (apply str)
          println)))))

(defn survey->cartography-format
  "Translate coords in survey format to cartography format."
  [[y x z p]]
  [p x y z])

(translate-coords survey->cartography-format "survey_coords.txt" "cartography_coords.txt")

Hope this helps.

Edit: For CSV reading you probably want something like OpenCSV.

kotarak 2009-11-17 08:44:59

Thanks for the tutorial -- there's a LOT of useful info in there that will take me some time to digest. I modeled a function of my own on the one you used here and it worked like a charm. Thanks again!

Bill_B 2009-11-18 01:52:43

ansaurus

tags:

views:

answers:

Newbie transforming CSV files in Clojure

related questions