ansaurus

Question

R: serialize base64 encode/decode of text not exactly matching

Answer 1

+2 A:

JD: I ran your code snippet on my Linux box, then looked at the differences computed by randList[[1]][[i]] - parsedThing[[1]][[i]].

Yes, the values are different, but only at the level my machine's floating-point tolerance. A typical difference was -4.440892e-16 -- which is pretty tiny. Some differences were zero.

It does not surprise me that the save/restore introduced that (tiny) level of change. Any significant data conversion runs the risk of "bobbling" the least significant digit.

pteetor 2010-06-25 17:44:02

Welcome to SO, Paul. Glad to have you on board.

Shane 2010-06-25 17:49:26

That's exactly what made me think this was a floating point rounding type of error. I was a little surprised at the introduction of noise, albeit very small noise. It was just outside of my experience so I thought I'd ask as to the cause. I'm rolling forward in my code assuming this is "close enough." Glad you're in SO!

JD Long 2010-06-25 18:02:42

Glad to be here!

pteetor 2010-06-25 18:16:41

Answer 2

+2 A:

Ok, now that you show the output I can explain to you what you're doing (following Paul's lead here).

As that is a known issue (see e.g. this R FAQ entry), you should buckle up and use any one of

identical()
all.equal()
functions from the RUnit package such as checkEquals

In sum, there seems nothing wrong with the base64 encoding you are using. You simply employed the wrong definition of exactly. But hey, we're economists, and anything below a trillion or two is rounding error anyway...

Dirk Eddelbuettel 2010-06-25 18:08:09

I worked for an accounting firm for a period in my life so there's something that tickles inside my ear canal when I can't reproduce certain things *exactly*. Thank you for giving me a sanity check.

JD Long 2010-06-25 18:17:14

Please see the (classic) [What Every Computer Scientist Should Know About Floating-Point Arithmetic](http://docs.sun.com/source/806-3568/ncg_goldberg.html). You and I ain't computer scientists, but we occassionally play one on telly.

Dirk Eddelbuettel 2010-06-25 18:21:13

I've actually read big chunks of that before. I'm afraid the ideas got replaced with beer. I'd like to blame my kid for erasing my brain, but that's probably not realistic. It was beer.

JD Long 2010-06-25 18:32:49

Answer 3

+2 A:

ascii=T in your call to serialize is making R do imprecise binary-decimal-binary conversions when serializing and unserializing causing the values to differ. If you remove ascii=T you get exactly the same numbers back as now it is a binary representation which is written out.

base64encode can encode raw vectors so it doesn't need ascii=T.

The binary representation used by serialize is architecture independent, so you can happily serialize on one machine and unserialize on another.

Reference: http://cran.r-project.org/doc/manuals/R-ints.html#Serialization-Formats

Jyotirmoy Bhattacharya 2010-06-26 03:22:58

Great answer! Thanks!

JD Long 2010-06-26 18:12:29

ansaurus

tags:

views:

answers:

R: serialize base64 encode/decode of text not exactly matching

related questions