



Is is possible to only deserialize a limited number of items from a serialized array?


I have a stream that holds a serialized array of type T. The array can have millions of items but i want to create a preview of the content and only retrieve the, say, first one hundred items. My first idea was to create a wrapper around the input stream that limits the number of bytes, but there's no direct translation from the number of items of the array to the stream size.


Could you maybe alter your data source so it contains a data preview in another array which you can deserialize separately?

+1  A: 

No, this can't be done with standard .NET serialization. You'll have to invent your own storage format. For example, include a header with offsets of data chunks:

<chunk-2-offset>  --+
...                 |
----------------    |
...                 |
<chunk-1>           |
...                 |
----------------    |
...               <-+

So in order to preview data (from any arbitrary position) you'll have to load at most ceil(required-item-count/chunk-size). This will incur some overhead, but it's much better than loading the whole file.

Anton Gogolev
So i have to save the array in chunks and accept that it loads a little more?
+1  A: 

What is the serializer?

With BinaryFormatter, that would be very, very tricky.

With xml, you could perhaps pre-process the xml, but that it very tricky.

Other serializers exist, though - for example, with protobuf-net there is little difference between an array/list of items, and a sequence of individual items - so it would be pretty easy to pick of a finite sequence of items without processing the entire array.

Complete protobuf-net example:

class Test {
    public int Foo { get; set; }
    public string Bar { get; set; }

    static void Main() {
        Test[] data = new Test[1000];
        for (int i = 0; i < 1000; i++) {
            data[i] = new Test { Foo = i, Bar = ":" + i.ToString() };
        MemoryStream ms = new MemoryStream();
        Serializer.Serialize(ms, data);
        Console.WriteLine("Pos after writing: " + ms.Position); // 10760
        Console.WriteLine("Length: " + ms.Length); // 10760
        ms.Position = 0;
        foreach (Test foo in Serializer.DeserializeItems<Test>(ms,
                PrefixStyle.Base128, Serializer.ListItemTag).Take(100)) {
            Console.WriteLine(foo.Foo + "\t" + foo.Bar);
        Console.WriteLine("Pos after reading: " + ms.Position); // 902


Note that DeserializeItems<T> is a lazy/streaming API, so it only consumes data from the stream as you iterate over it - hence the LINQ Take(100) avoids us reading the whole stream.

Marc Gravell
Would a bit of hackery implementing the ISerializable interface not get him what he wants?
@Noldorin - not as far as I know... you don't get to intercept the array deserialization, regardless of how you handle each individual item (via ISerializable)
Marc Gravell