views:

297

answers:

6

I am looking for information on techniques, algorithms, etc. on how to maintain network integrity in a dynamic peer-to-peer network. Both practical implementations, academic papers and anything else in that category are welcome.

Imagine a network that is solely peer-to-peer based where each node is only connect to x other nodes. Without having a grand list of all nodes, each node is responsible for maintaining a connection with the network. Nodes go down and come up dynamically, meaning each node needs to ask it's neighbors (and their neighbors?) for new nodes to connect to, in order to maintain the x number of connections.

Network segmentation (two halves of the network are only connected by one node from each network - if either of those go down, the network splits into two) and how to avoid this and efficient routing (distance metrics, etc.) are my main interests, but anything relating to networks with similar descriptions would be interesting.

I am currently looking at the Chord DHT protocol as it has some similarity to what I'm asking.

+1  A: 

Netsukuku project aims for creation of protocols and software implementation for large wifi based ad-hoc network(s).

From their FAQ: "The Netsukuku project is based on the very simple idea of exploiting the great potentiality of the wifi connectivity, making the PCs of wireless communities act as routers and handle together an ad-hoc network even bigger than the Internet."

Rafał Dowgird
I am more looking for "regular" (ethernet) computer networks and software/protocols.
Christian P.
Fair enough. I am not even 100% sure how serious the Netsukuku project is, it just seemed vaguely similar to what you're looking for.
Rafał Dowgird
+1  A: 

My thoughts only -- not a complete solution; not tested in practice but still may touch on a number of interesting problems and potential solutions.

Standardised time for node failure and rejoining must be recorded and managed. To achieve this the network does not calculate on real-time basis but on an animation frame number basis. Have N front-end processors assigning FEP ID and job ID and network animation frame number to incoming jobs. There are a number of issues with real-time that are not quite addressed with quantizing time even; in some exception cases, its a bit like in accounting, posting events to when they should be regarded as occuring rather than when any cash moves.

For high performance, the heartbeat packets must also contain details of jobs being performed and recently completed or abandoned as well as the inventory of hosts in the network.

Network proceeds to process work items and publish their results to adjacent peers or FEPs. FEPs forward completed job details to clients, and can take over for failed FEPs as only state in an FEP is the last serial number stamped on a request.

Network must have a quorum to continue. External monitors track connectivity and inform the nodes which experience changes in connectivity whether they are now within or outside the quorum.

Where a work item is not completed by a machine because it fails, or a new node joins the network, a new work allocation policy must be established based on work item ID to allocate the work to the remaining nodes, until the new node comes back online.

For cases where multiple nodes perform the same job (duplication of effort - which is possible but minimised by designing the usual timeouts sensibly) the jobs must be rollbackable, and the conflict resolved using Markov Chains.

To detect the possible duplications reliably jobs must auto-rollback in less time than the timeout for receiving job results that applies during a crisis period ie when nodes are failing. A shorter timeout applies when nodes are not failing.

martinr
These theories suggest a lot of outside monitoring, which is per definition impossible for pure peer-to-peer network. The idea is to avoid any centralized nodes or servers, so anything that wishes to monitor the network must be a part of it and use the network for inspection.
Christian P.
True but the "external monitors" could instead be part of the system and their choice of quorum chosen based on yet another Markov Chain. Some central authority required to seed the Chain however.The main limit on performance *with integrity* is that results of any given job could not be released by a FEP until recent state information has been received (or a failure comprehended and compensated for) from the whole functioning network for the animation frame where the operation completed so the performance is fundamentally limited by animation frame resolution, topology and network speed.
martinr
A: 

Have you looked at Kademlia? It is similar to Chord, and versions of it are used by BitTorrent and eMule. The paper lists a few measures for ensuring network integrity, even in the face of attack. The two basic ones are

  • Maintain enough peers so that the failure of enough of them to cause trouble is unlikely
  • Maintain the list of known peers in order of longest uptime. Studies have shown that the probability of a node going offline within the next hour gets lower the longer it has already been online. This also makes it difficult for attackers to flood the network with malicious nodes.

I'm not sure how much of this applies to Chord, because I haven't read as much about it, but I think going with a DHT is a good idea unless you need fuzzy searching.

DataWraith
A: 

Use Chord. http://en.wikipedia.org/wiki/Chord%5F%28peer-to-peer)

I've implemented it in projects before and it addresses these problems.

Matt Williamson
A: 

Just to avoid reinventing the wheel, take a look at the various routing protocols. OSPF might be a good starting point, given your scenario. Of course there are many, many variables that might make it not the best choice for you. Such as:

  • you can keep a shortest path to X nodes; if a node goes down, attached nodes are informed and can do a new SP search to find a suitable one; you need to consider overhead for ping and keep-alive messages
  • do you need to instradate connections (i.e. searches in the p2p network) or just maintain a large set of nodes interconnected (a la botnet)? If so, a mixed approach (a small, distributed hash table for small subsets of the network + OSPF/BGP for borders) might help;
  • so on and so forth
lorenzog
+1  A: 

For ubiquitous computing various ad-hoc P2P networks have been developed and they'd probably fit your needs. It's been used for example in the army to deploy small capsules each talking to neighbors up to usually some command center. If you don't have a center, it may be related to distributed computing, anyway here are some links:

Wernight