tags:

views:

223

answers:

3

I've been using java to parse numbers, e.g.

(. Integer parseInt  numberString)

Is there a more clojuriffic way that would handle both integers and floats, and return clojure numbers? I'm not especially worried about performance here, I just want to process a bunch of white space delimited numbers in a file and do something with them, in the most straightforward way possible.

So a file might have lines like:

5  10  0.0002
4  12  0.003

And I'd like to be able to transform the lines into vectors of numbers.

+12  A: 

If you're very sure that your file contains only numbers, you can use the Clojure reader to parse numbers. This has the benefit of giving you floats or Bignums when needed, too.

user> (read-string "0.002")
0.0020

This isn't safe if you're parsing arbitrary user-supplied input, because reader macros can be used to execute arbitrary code at read-time and delete your hard drive etc.

If you want one huge vector of numbers, you could cheat and do this:

user> (let [input "5  10  0.002\n4  12  0.003"]
        (read-string (str "[" input "]")))
[5 10 0.0020 4 12 0.0030]

Kind of hacky though. Or there's re-seq:

user> (let [input "5  10  0.002\n4  12  0.003"]
        (map read-string (re-seq #"[\d.]+" input)))
(5 10 0.0020 4 12 0.0030)

Or one vector per line:

user> (let [input "5  10  0.002\n4  12  0.003"]
        (for [line (line-seq (java.io.BufferedReader.
                              (java.io.StringReader. input)))]
             (vec (map read-string (re-seq #"[\d.]+" line)))))
([5 10 0.0020] [4 12 0.0030])

I'm sure there are other ways.

Brian Carper
Just what I was looking for, thank you.
Rob Lachlan
this method has the useful benefit of properly parsing rational numbers as well. if you modify the re-seq properly (re-seq #"[\d\/\.]+" input)
Jeremy Wall
+3  A: 

If you want to be safer, you can use Float/parseFloat

user=> (map #(Float/parseFloat (% 0)) (re-seq #"\d+(\.\d+)?" "1 2.2 3.5"))
(1.0 2.2 3.5)
user=> 
lazy1
+3  A: 

Not sure if this is "the easiest way", but I thought it was kind of fun, so... With a reflection hack, you can access just the number-reading part of Clojure's Reader:

(let [m (.getDeclaredMethod clojure.lang.LispReader
                            "matchNumber"
                            (into-array [String]))]
  (.setAccessible m true)
  (defn parse-number [s]
    (.invoke m clojure.lang.LispReader (into-array [s]))))

Then use like so:

user> (parse-number "123")
123
user> (parse-number "123.5")
123.5
user> (parse-number "123/2")
123/2
user> (class (parse-number "123"))
java.lang.Integer
user> (class (parse-number "123.5"))
java.lang.Double
user> (class (parse-number "123/2"))
clojure.lang.Ratio
user> (class (parse-number "123123451451245"))
java.lang.Long
user> (class (parse-number "123123451451245123514236146"))
java.math.BigInteger
user> (parse-number "0x12312345145124")
5120577133367588
user> (parse-number "12312345142as36146") ; note the "as" in the middle
nil

Notice how this does not throw the usual NumberFormatException if something goes wrong; you could add a check for nil and throw it yourself if you want.

As for performance, let's have an unscientific microbenchmark (both functions have been "warmed up"; initial runs were slower as usual):

user> (time (dotimes [_ 10000] (parse-number "1234123512435")))
"Elapsed time: 564.58196 msecs"
nil
user> (time (dotimes [_ 10000] (read-string "1234123512435")))
"Elapsed time: 561.425967 msecs"
nil

The obvious disclaimer: clojure.lang.LispReader.matchNumber is a private static method of clojure.lang.LispReader and may be changed or removed at any time.

Michał Marczyk