views:

226

answers:

4

I am reading a large data stream using lazy bytestrings, and want to know if at least X more bytes is available while parsing it. That is, I want to know if the bytestring is at least X bytes long.

Will calling length on it result in the entire stream getting loaded, hence defeating the purpose of using the lazy bytestring?

If yes, then the followup would be: How to tell if it has at least X bytes without loading the entire stream?

EDIT: Originally I asked in the context of reading files but understand that there are better ways to determine filesize. Te ultimate solution I need however should not depend on the lazy bytestring source.

A: 

Is there a reason you're not using hFileSize :: Handle -> IO Integer for getting the length of the file?

yatima2975
Yes, because the input may not actually be a file, but a network stream. I'm not interested in the actual length of the data, but in whether there is sufficient data to make the contents valid (or truncated in which case the data is corrupt).
me2
In which case there is no program on Earth that can find the length of the stream without reading it to the end.
Paul Johnson
+7  A: 

Yes.

length . take x.

ephemient
Thanks. So the answer is yes, using length will load the entire string into memory?
me2
Yes, length will force the whole list and "load: it. Thus, the countermeasure in form of "take"
ADEpt
So, to be clear we all understand, yes length will force the string into memory, so if you call take x first, then at least you are only forcing the first x bytes into memory.
MtnViewMark
@MtnViewMark: it will probably force more than *x* bytes, since strictness of lazy bytestrings happens on a per-chunk basis. Forcing the first character of a chunk also forces the rest of that chunk.
Porges
Sorry for the short answer, I wrote it quickly and didn't have time to come back and expand it. Yes, `length` will force the entire string, while `length . take x` will force only the chunks containing the first `x` bytes, which might be more than `x` bytes total. The default chunk size is somewhere in the vicinity of 31kB, I think.
ephemient
A: 

EDIT: sorry. I guess I was thinking bytestrings were lists. There is no genericLength for bytestrings.

length is strict because the type it returns Int is strict. You can use genericLength from Data.List and import a library that defines lazy Peano numbers and gives you a Num instance for them, e.g the numbers library:

That would let you express your function in the way you would like, but ephemient's answer is functionally the same, and doesn't require importing a new library.

I just did a blog post on the subject here, if that sounds like an approach you might be interested in:

http://coder.bsimmons.name/blog/2010/03/lazy-arithmetic-in-haskell/

jberryman
ByteString does not come with its `genericLength`. You need to write one yourself.
KennyTM
Somewhere (probably on Hackage, though I can't find it now) I came across a "chunked" natural number type, along the lines of `data Nat = Zero | Sum Integer Nat`, which would be even more efficient than the `take`-based solution, probably.
Reid Barton
A: 

It will have to iterate the entire string, but if you don't keep a reference to the entire lazy byte-string anywhere else, I believe it should be able to free the head of the string as it progresses towards its tail.

Peaker