Models that describe the processes by which a network forms or is generated are often called generative network models. One of the most famous ones is the "preferential attachment" model.
related to "rich get richer" idea in economics (Herbert Simon).
Preferential attachment (also called cumulative advantage in older literature) refers to the idea that new nodes in a network preferentially attach themselves to some nodes in the existing network rather than others.
The attachment is described in terms of a probability distribution over existing nodes for the creation of an edge. The preference is described by an attachment kernel, , which is the probabilistic weight of node . The probability that a new node connects to existing node is thus:
Different preference types can be considered, the main categories being:
The attachment kernel is then generally a function of these: .
Note: We need a seed network (initial condition), to get any network out of this model. The network will eventually be independent of the seed, but this can take a very large number of nodes , sometimes in the order of billions.
Proposed in the study of citation networks.
The main assumption of the model is that the probability of each new edge created whew we add a new node only depends on the degree of that node (on the in-degree to be precise, i.e. the number of citations it has). In particular it assumes an affine preferential attachment:
One can write a master equation for the degree distribution, which has a steady state (i.e. behavior given by power-law decay with power:
.
Thus, many scholars believe that this simple model may describe the fundamental mechanism by which power laws are obtained in many real-world networks.
Almost a special case of de Solla Price model, but with new assumptions:
Degree distribution as a function of time of creation
Nodes that were added earlier to the network have had more time for new nodes to attach to them, and thus in average have higher in-degree.
This can be shown by starting with a new quantity: the fraction of nodes (in average over the ensemble, so effectively the probabilty ) that a node was created at time and has in-degree when the network has vertices, . The "time" $increments by every time we add a node, and thus effectively labels nodes, in the order by which they were added.
One can then write a master equation, noting that no nodes have , except the new node which has , and also in-degree . However the fraction of nodes having any being created at any particular time goes to as , and so we change variables to a probability density in by dividing by . We also rescale time by dividing by for convenience, and to properly convert the master equation into a differential equation.
Sizes of in-components
Can also derive a master equation. See homework problem 4
Kleinberg et al have proposed a model where new nodes imitate the out-edge configuration of an existing node. This is done by linking to some of that edge's neighbours, while the rest of connections are to randomly chosen nodes in the network. In particular, we first choose a node uniformly at random, and then go through its edges, copying it with probability , or ignoring it and choosing a node at random with probability . Remarkably, the expression for the fraction of nodes with degree when node size is has the same form as in Price's model, but with an give by an expression depending on , and thus it also follows a power law. The networks still differ in other structural aspects, in particular regarding correlations.
This model reminds us that just knowing the degree distribution, doesn't tell us the mechanism that gave rise to it. We need more information to make this inference.
In some biological networks (metabolic and protein-protein networks) vertex copying seems to be the most probable explanation for observed power law distributions. The mechanism by which this happens is gene duplication (by which, when copying DNA, a gene is duplicated by mistake) and point mutations (a mutation of a single base pair). This, through evolution creates different proteins, which (due to their common origin) are still similar and have a lot of protein-protein interactions in common.
Observations of power law in protein and metabolic networks:
Lethality and centrality in protein networks
The large-scale organization of metabolic networks
Proposed models
A Model of Large-Scale Proteome Evolutionhttp://www.santafe.edu/media/workingpapers/01-08-041.pdf
Modeling of protein interaction networks
An alternative networks may "form". Often these are rationally created networks to optimize toward some goal.
Travel time and cost trade-offs
A good example is: airline networks where a compromise between lowering cost (so having more central hubs and spokes to fill planes more fully, than flights between two minor destinations) and length of travel (to satisfy customers) is sought.
Ferrer i Cancho have one such simple model to find compromises between mean geodesic distance (travel time) and number of edges (cost). They find interesting regimes with local minima with trees with exponential distributions, passing through trees with power law distributions, and finally star graphs, as they varied the parameter controlling the relative importance of the two compromising variables. However, for most values of the parameter, the global minimum was actually the star graph.
An alternative model that shows interesting behavior in the global minimum too, by assigning an actual geometric distance to the edges (so that it is a spatial network, see MMathPhys miniprojects.Networks). Depending on whether they assigned more importances to travelling times, or to waiting times at nodes, they got more road-like networks (waiting times at intersections negligible) or more airline-like networks (waiting times significant).
See recent research: Like air traffic, information flows through neuron 'hubs' in the brain, finds IU study