views:

250

answers:

4

Hey guys,

I need to add some kind of archiving functionality to a Objective-C Trie implementation (NDTrie on github), but I have very little experience with C and it's data structures.

struct trieNode
{
    NSUInteger key;
    NSUInteger count,
    size;
    id object;
    __strong struct trieNode ** children;
    __strong struct trieNode * parent;
};

@interface NDTrie (Private)
- (struct trieNode*)root;
@end

What I need is to create an NSData with the tree structure from that root - or serialize/deserialize the whole tree some other way (conforming to NSCoding?), but I have no clue how to work with NSData and a C struct containing pointers.

Performance on deserializing the resulting object would be crucial, as this is an iPhone project and I will need to load it in the background every time the app starts.

What would be the best way to achieve this?

Thanks!

A: 

You should always try the easy way first:

// serializing:
[myTrie writeToFile:myPath atomically:NO];

// deserializing
NDTrie* myTrie = [NDTrie trieWithContentsOfFile:myPath];

If that's really not fast enough, you can look into manually serializing the underlying structs.

Edit:

You made clear that the amount of data requires an optimized implementation.

I'd propose to rewrite the trieNode struct and accessing methods to use indices instead of pointers for the parent and children fields. The indices would point into one big C array of trieNode structs, where all nodes are alloced from.

This C array could be kept in an NSData object in the wrapping NDTrie object. Serialization and deserialization would then just mean to save/load the NSData object (endianess issues aside).

Nikolai Ruhe
The problem is my data set is quite large, and right now I can't make it fit both the temporary NSArray and the actual trie data structure on the device memory - also it's just too slow to create both structures. That's why I was looking for a way to skip this array re-creation and serialize the actual trie model.
leolobato
+1  A: 

Reimplement the trie node structure as an Objective C class. e.g.

@interface TrieNode
{
    NSUinteger key;
    NSUInteger count;
    //NSUInteger size; // not needed if you use an NSArray for the children.
    id object;
    NSArray* children;
    TrieNode* parent;
}
// methods
@end

Then you can use the standard Objective-C mechanism to archive and unarchive these objects.

If after implementing the above and profiling your code, you find performance is a problem, you can start optimising. For instance, by accessing ivars using the C struct pointer stuff e.g.

aTrieNode->parent;

or by replacing the NSArray with a C array etc.

JeremyP
The problem is I'd have to pretty much rewrite the whole existing implementation to use a class instead of that struct - that's why I'm looking for a way to serialize the existing struct, it should be considerably faster to implement.
leolobato
It doesn't look to me like there's very much to it. It's only one source file and you'd probably find things simplified if you use a class. I might have a go myself...
JeremyP
+1  A: 

Assuming you need to stick with straight C, because that's how things are already setup, what you'll need to do is actually pretty simple.

Just write a C function to write out your tree to disk, with some assumption about ordering (e.g. you write it our depth first, left to right). For any Objective-C objects, encode them into NSData, and write out the size and bytes of these as part of your stream.

When you read the data back in, simply reconstruct the tree based on your ordering assumptions, and set up the pointers to the children. Unarchive any of the embedded Objective-C objects as appropriate.

You can probably do this with NSCoder somehow, but it might be easier to do the tree reconstruction outside that, as you can recurse down the tree passing whatever arguments you like, which isn't really very easy with NSCoding.

I do have some (Desktop OS X) code that does something very similar to this, without the embedded objects, but it's pretty fiddly, and I can't post it.

One optimisation in that code is to read in the data into an internal buffer, in MB chunks (rather than a small number of bytes at a time, for each struct), and then read the data from that buffer, although I'm not sure that was ever benchmarked, and it may or may not make a significant difference on the iPhone in any case. It looks like there's a similar optimisation for writing as well, which is more likely to be a win, as I understand it (iPhone writes are expensive, or so I've heard).

Snydely Whiplash
A: 

I think you shoud implement NSCoding protocol: in your initWithCoder: create an NSArray with all children and realloc a such array of struct in encodeWithCoder:.

In this way you'll be able to use the original array of struct in the rest of the project.

unixo