tags:

views:

240

answers:

5

I am new to Haskell and trying to fiddle with some test cases I usually run into in the real world. Say I have the text file "foo.txt" which contains the following:

45.4 34.3 377.8
33.2 98.4 456.7
99.1 44.2 395.3

I am trying to produce the output

[[45.4,34.3,377.8],[33.2,98.4,456.7],[99.1,44.2,395.3]]

My code is below, but I'm getting some bogus "LPS" in the output... not sure what it represents.

import qualified Data.ByteString.Lazy.Char8 as BStr
import qualified Data.Map as Map

readDatafile = (map (BStr.words) . BStr.lines)

testFunc path = do
    contents <- BStr.readFile path
    print (readDatafile contents)

When invocated with testFunc "foo.txt" the output is

[[LPS ["45.4"],LPS ["34.3"],LPS ["377.8"]],[LPS ["33.2"],LPS ["98.4"],LPS ["456.7"]],[LPS ["99.1"],LPS ["44.2"],LPS ["395.3"]]]

Any help is appreciated! Thanks. PS: Using ByteString as this will be used on massive files in the future.

EDIT:

I am also puzzled as to why the output list is grouped as above (with each number bound in []), when in ghci the below line gives a different arrangment.

*Main> (map words . lines) "45.4 34.3 377.8\n33.2 98.4 456.7\n99.1 44.2 395.3"
[["45.4","34.3","377.8"],["33.2","98.4","456.7"],["99.1","44.2","395.3"]]
+2  A: 

This is an indication of the internal lazy bytestring representation type pre-1.4.4.3 (search the page for "LPS"). LPS is a constructor.

Matt Ball
Yeah I was starting to think it was related to lazy evaluation. I can treat the list just as I would any other however?
jparanich
I think so - my understanding is that it's just another internal representation that you shouldn't (for the most part) worry. I don't know Haskell super-well, however. Just don't forget about this if you start seeing wonky stuff.
Matt Ball
+7  A: 

What you're seeing is indeed a constructor. When you read the file, the result is of course a list of lists of Bytestrings, but what you want is a list of lists of Floats.

What you could do :

readDatafile :: BStr.ByteString -> [[Float]]
readDatafile = (map ((map (read .  BStr.unpack)) . BStr.words)) . BStr.lines

This unpacks the Bytestring (i.e. converts it to a string). The read converts the string to a float.

Not sure if using bytestrings here even helps your performance though.

mrueg
+2  A: 

readDatafile is returning a [[ByteString]], and what you are seeing is the 'packed' representation of all those characters you read.

readDatafile = map (map Bstr.unpack . bStr.words) . Bstr.lines

Here's an example ghci run demonstrating the problem. My output is different than yours because I'm using GHC 6.10.4:

*Data.ByteString.Lazy.Char8> let myString = "45.4"
*Data.ByteString.Lazy.Char8> let myByteString = pack "45.4"
*Data.ByteString.Lazy.Char8> :t myString
myString :: [Char]
*Data.ByteString.Lazy.Char8> :t myByteString
myByteString :: ByteString
*Data.ByteString.Lazy.Char8> myString
"45.4"
*Data.ByteString.Lazy.Char8> myByteString
Chunk "45.4" Empty
*Data.ByteString.Lazy.Char8> unpack myByteString
"45.4"
Michael Steele
Thanks for the info Michael... I noticed in the link Matt Ball had it is now a "chunk" in the newer releases, instead of "LPS [45.4]" as it would be on mine. I'm sure the Haskell wizards have a fine explanation that I am too fresh to comprehend. :)
jparanich
A: 

This is just the lazy bytestring constructor. You're not parsing those strings into integers yet, so you'll see the underlying string. Note that lazy bytestrings are not the same as String, so they have a different printed representation when 'Show'n.

Don Stewart
A: 

LPS was the old constructor for the old Lazy ByteString newtype. It has since been replaced with an explicit data type, so the current behavior is slightly different.

When you call Show on a Lazy ByteString it prints out the code that would generate approximately the same lazy bytestring you gave it. However, the usual import for working with ByteStrings doesn't export the LPS -- or in later revisions, the Chunk/Empty constructors. So it shows it with the LPS constructor wrapped around a list of strict bytestring chunks, which print themselves as strings.

On the other hand, I wonder if the lazy ByteString Show instance should do the same thing that most other show instances for complicated data structures do and say something like:

fromChunks ["foo","bar","baz"]

or even:

fromChunks [pack "foo",pack "bar", pack "baz"]

since the former seems to rely on {-# LANGUAGE OverloadedStrings #-} for the resulting code fragment to be really parseable as Haskell code. On the other-other hand, printing bytestrings as if they were strings is really convenient. Alas, both options are more verbose than the old LPS syntax, but they are more terse than the current Chunk "Foo" Empty. In the end, Show just needs to be left invertible by Read, so its probably best not to muck around changing things lest it randomly break a ton of serialized data. ;)

As for your problem, you are getting a [[ByteString]] instead of [[Float]] by mapping words over your lines. You need to unpack that ByteString and then call read on the resulting string to generate your floating point numbers.

Edward Kmett