How to design a high-level application protocol and data format for metadata syncing between devices and server?

views:

answers:

+3 Q:

How to design a high-level application protocol and data format for metadata syncing between devices and server?

I am looking for guidance on how to best think about designing a high-level application protocol to sync metadata between end-user devices and a server.

My goal: the user can interact with the application data on any device, or on the web. The purpose of this protocol is to communicate changes made on one endpoint to other endpoints through the server, and ensure all devices maintain a consistent picture of the application data. If user makes changes on one device or on the web, the protocol will push data to the central repository, from where other devices can pull it.

Some other design thoughts:

I call it "metadata syncing" because the payloads will be quite small, in the form of object IDs and small metadata about those ID-s. When client endpoints retrieve new metadata over this protocol, they will fetch actual object data from an external source based on this metadata. Fetching the "real" object data is out of scope, I'm only talking about metadata syncing here.
Using HTTP for transport and JSON for payload container. The question is basically about how to best design the JSON payload schema.
I want this to be easy to implement and maintain on the web and across desktop and mobile devices. The best approach feels to be simple timer- or event-based HTTP request/response without any persistent channels. Also, you should not have a PhD to read it, and I want my spec to fit on 2 pages, not 200.
Authentication and security are out of scope for this question: assume that the requests are secure and authenticated.
The goal is eventual consistency of data on devices, it is not entirely realtime. For example, user can make changes on one device while being offline. When going online again, user would perform "sync" operation to push local changes and retrieve remote changes.
Having said that, the protocol should support both of these modes of operation:
- Starting from scratch on a device, should be able to pull the whole metadata picture
- "sync as you go". When looking at the data on two devices side by side and making changes, should be easy to push those changes as short individual messages which the other device can receive near-realtime (subject to when it decides to contact server for sync).

As a concrete example, you can think of Dropbox (it is not what I'm working on, but it helps to understand the model): on a range of devices, the user can manage a files and folders—move them around, create new ones, remove old ones etc. And in my context the "metadata" would be the file and folder structure, but not the actual file contents. And metadata fields would be something like file/folder name and time of modification (all devices should see the same time of modification).

Another example is IMAP. I have not read the protocol, but my goals (minus actual message bodies) are the same.

Feels like there are two grand approaches how this is done:

transactional messages. Each change in the system is expressed as delta and endpoints communicate with those deltas. Example: DVCS changesets.
REST: communicating the object graph as a whole or in part, without worrying so much about the individual atomic changes.

EDIT: some of the answers rightly say that there is not enough info about the app to offer good enough suggestions. The exact nature of the app might be distracting, but a very basic RSS reading app is a good enough approximation. So let's say the app spec is the following:

There are two classes: feeds and items.
I can add, rename and delete feeds. Adding a feed subscribes to it and starts receiving items for that feed. I can also reorder the feed display order in the UI.
As I read items, they are marked as read. I cannot mark them unread or do anything else with them.
Based on the above, the object model is:
- "feed" has attributes "url", "displayName" and "displayOrder" (displayOrder is index of feed in UI's list of feeds; reordering feeds locally changes the displayOrder of all feeds so that the indexes remain unique and sequential).
- "item" has attributes "url" and "unread", and many-to-one relation "feed" (each item belongs in one feed). "url" also behaves as GUID for the item.
- actual item contents are downloaded locally on each device and are not part of sync.

Based on this design, I can set up my app on one device: add a bunch of feeds, rename and reorder them, and read some items on them, which are then marked as unread. When I switch devices, the other device can sync the configuration and show me the same feed list with same names, order and same item read/unread states.

(end edit)

What I would like in the answers:

Is there anything important I left out above? Constraints, goals?
What is some good background reading on this? (I realize this is what many computer science courses talk about at great length and detail... I am hoping to short-circuit it by looking at some crash course or nuggets.)
What are some good examples of such protocols that I could model after, or even use out of box? (I mention Dropbox and IMAP above... I should probably read the IMAP RFC.)

+1 A:

A couple of thoughts:

1). What assumptions can you make about the reliability of delivery of change notifications? And the reliability of ordering of those notifications? My instict is that it is better to tolerate loss and mis-order by reverting to requesting complete re-delivery of the meta-data.

2). In effect you have a stream of meta-data and also a stream of data. What assumptions can you make about their relative ordering. Can you receive newly versioned data before the meta data arrives? Guessing again, I suspect that this can happen. I would expect that the data payloads must contain meta-data version information. Hence the clients could refresh their meta-data when they need to?

3). Is it possible for data corresponding to two different versions of the meta-data to arrive at the device. I suspect "yes". How readily can a client deal with this?

4). The meta-data may need to include presentation or validation information.

djna 2010-05-06 06:38:18

1. yes, that's what I thought, I'd request complete metadata (say from known point X forward). 2. yes, client can receive new data for which it does not have metadata yet, I'd then have the client request relevant metadata (which may or may not exist by that point) 3. yes, there may be different metadata for same data, I'd have some prioritization then based on e.g timing -- newer overwrites older

Jaanus 2010-05-06 15:29:25

+1 A:

The metadata that you described sounds graph. However, switching to OWL/RDF track may be quite a shift. Basically, you just need to have properties on objects that may be interlinked (e.g. files aligned in hierarchy). From this view point JSON is very natural choice for property access, if combined with REST API. If this approach chosen, I recommend studying Open Data Protocol first.

By the way, why not just use version control system, e.g. Git, and have the properties as JSON objects inside text files in the system? If each object has its metadata stored in very little JSON chunk in a separate file, the system will automatically be able to do most updating and automatic conflict resolution. Most version control systems provide good APIS for this type of purpose.

Ville Laitila 2010-06-15 14:16:53

+1 A:

If I wanted to do this quickly without too much development time, I'd just use WebDAV on metadata file(s) and be done. IMO, that should cover most of your requirements. Also using existing protocol has advantages over custom protocols in existing libraries an not spending time reinventing wheel and debugging protocol implementation code.

EDIT: If you make the configuration file easy to merge as a file, then you just need to keep 2 versions of the config file. One base version, how the config looked last time we sync'ed. One current version of metadata and then you get your peer's version of metadata. With those 3 files you do a simple 3-way merge, you auto-decide conflicts for the newer version and that is it. Keeping the base version is important. Now if you merge with multiple clients, you can merge at different points and thus require different version of your config file as a base. Simply keep every result of a sync, until you overwrite it with a new sync from that peer client. In theory you can have XML config files, but 3-way merging of XML files is just painful and the tools are not quite there yet, imho. The specific format or the type of app does not really matter.

Jiri Klouda 2010-06-19 02:10:02

Yeah, but part of the question is also what is the metadata file format :)

Jaanus 2010-06-20 00:09:48

I know it is your question, but I don't see it asked and besides, the metadata file format is not a concern of the sync protocol. WebDAV has almost everything you need from the actually network protocol and it uses HTTP for transport as well. You can have a configuration directory and multiple metadata files as well. Without knowing your application, there is no point in even speculating about the metadata file format anyway. But it should be some format that is easy to merge, ie not XML or derivatives.

Jiri Klouda 2010-06-20 08:50:58

I added a concrete application spec that should help you help me with the more concrete data format.

Jaanus 2010-06-23 23:05:06

ansaurus

tags:

views:

answers:

How to design a high-level application protocol and data format for metadata syncing between devices and server?

related questions