Given some graph, I would like to determine how likely it is that it was generated randomly. I was told that a comparison to the Erdős–Rényi model was a good way to get this information, but I can't quite figure out how to do that.
Any advice?
Given some graph, I would like to determine how likely it is that it was generated randomly. I was told that a comparison to the Erdős–Rényi model was a good way to get this information, but I can't quite figure out how to do that.
Any advice?
The simplest way would probably be to compare the expected number of links with what you observed in the given graph. A slightly smarter method would be to examine the degree distributions. Erdős–Rényi graphs will have a binomial distributions, while real world networks are typically power law.
It might also be easier to test if you had an idea as to what other kinds of models were being used to generate the graph.
You will not be able to say whether a single graph is generated randomly. If the generating algorithm is random, than you have to check for randomness of the distribution of edges. But you will need many instances generated by that algorithm. Better check with the notion of randomness in mathematics, cryptography and information theory. [or maybe you want to start with rfc 1750]
The Erdős–Rényi model basically states that you take a number n of nodes and every possible edge has probability p of existence [G(n,p)-model]. Thus by p you can generate the expected number of edges and deviation from this expectation. If a significant ratio of graphs is within standard deviation of this expectation, well, you might not state that your algorithm is random at all, but you have at least one feature uncovered, the expected number of edges.
But again, without having a lot of states (graphs, intermediary graph generation steps or similar) you will be lost there. Say, I give you a number: 4. Is it generated randomly or not?
You can have a look at the ERGM package for R (www.r-project.org) at www.statnet.org. Although you might not be able to say with 100% certainty that your observed network is produced by a random process, you will be able to assess the likelihood that it was produced by random or non random partner selection processes. ERGM has a function called gof which stands for goodness-of-fit and will compare your observed network with simulated random networks and looks at network statistics such as: geodesic distance distribution, edgewise shared partner distribution, degree distribution and the triad census distribution. This will allow you to make an informed decision whether you consider your network to be random or not.