Assortative mixing

guillefix 4th November 2016 at 2:43pm

See Measures and metrics for networks

Homophily or assortative mixing is a bias in favour of connections between network nodes with some similar characteristics.

Assortative mixing by enumerative characteristics

Enumerative (a.k.a categorical) characteristics are those where the possible values don't have any particular metric for being close (i.e. a distance function). Eg.: gender, school

Measure given by modularity:

Q=12mij(Aijkikj2m)δ(ci,cj)Q=\frac{1}{2m}\sum_{ij} \left ( A_{ij}-\frac{k_i k_j}{2m} \right) \delta(c_i, c_j)

where δ(ci,cj)\delta(c_i, c_j) is the Kronecker delta which is 1 if the category of ii, cic_i is the same as that for jj, and 00 otherwise. Another way to write it turns out to be:

Q=r(errar2)Q=\sum_r (e_{rr}-a_r^2)

where erse_{rs} is the fraction of edges that join nodes of type rr to nodes of type ss, and ara_r is the fraction of ends of edges attached to nodes of type rr. If we generalize to weighted networks then kk would be the strength, i.e. the weighted degree; and erse_{rs} would be fraction of edge weights joining nodes in the two sets, and ara_r would be fraction of half the edge weight assigned to nodes in set rr.

This is just equal to the number of edges connecting vertices of alike type, minus the expected such number for a random network (with degrees distributions for each category fixed).

Bij=Aijkikj2mB_{ij} = A_{ij}-\frac{k_i k_j}{2m} is called the modularity matrix.

The normalized modularity (normalized by its maximum value when all edges fall between alike edges is called an assortativity coefficient.

Assortative mixing by scalar characteristics

By scalar charactersitics we means does that have a metric that gives a notion of closeness so that two nodes can be approximately alike (age, etc.).

Measure by a Pearson coefficient (i.e. a normalized covariance) for the correlation of the value of the scalar xix_i at the two ends of the edge. The covariance turns out to be:

Q=12mij(Aijkikj2m)xixjQ=\frac{1}{2m}\sum_{ij} \left ( A_{ij}-\frac{k_i k_j}{2m} \right) x_i x_j

and one can divide by its max value to get an assortativity coefficient.

If positive assortativity, sometimes the network is said to be stratified. Others nonlinear kinds of correlations may not be detected by the Pearson coefficient (for example low and high xx being more often connected with intermediate xx). Other information theoretic measures may then be used, or a scatter plot of xix_i vs xjx_j for visual insight, as in figure below:

Note that in this figure, the values, 9, 10, 11, 12 are bins, and the positions of points (which represent edges, or pairs of nodes) within each bin is just used to visually aid in identifying blocks with more density.

Assortative mixing by degree

Degree is special case of scalar because degrees may be close to one another (using usual distance function on integers), so use same formula.

If a network shows assortative mixing by degree, it often displays a core (with high density of nodes) and a periphery (with low) structure See (a). If it shows dissasortative mixing by degree, it often shows star like features and is more uniform See (b).

There appears to be another definition of a quantity called assortativity in this review.


A network partition with Q>0Q>0 exhibits "assortative mixing"
A network partition with Q<0Q<0 exhibits "disassortative mixing"

Can also rewrite the assortativity coefficient in this case as a Pearson coefficient for the distribution of the "excess degree" of nodes (i.e. follow an edge to a node and look at distribution of remaining stubs). See page 5 in notes.

Notes:

Given some network and two partitions (assignment of nodes to categories), we can calculate their modularities, and find which is "more modular"

Maximizing GG is a good way of finding "communities" of densely-connected nodes with sparse connections between those sets.

Can define scalar measure of assortativity. See page 3 in notes.