See Measures and metrics for networks
Homophily or assortative mixing is a bias in favour of connections between network nodes with some similar characteristics.
Assortative mixing by enumerative characteristics
Enumerative (a.k.a categorical) characteristics are those where the possible values don't have any particular metric for being close (i.e. a distance function). Eg.: gender, school
Measure given by modularity:
where is the Kronecker delta which is 1 if the category of , is the same as that for , and otherwise. Another way to write it turns out to be:
where is the fraction of edges that join nodes of type to nodes of type , and is the fraction of ends of edges attached to nodes of type . If we generalize to weighted networks then would be the strength, i.e. the weighted degree; and would be fraction of edge weights joining nodes in the two sets, and would be fraction of half the edge weight assigned to nodes in set .
This is just equal to the number of edges connecting vertices of alike type, minus the expected such number for a random network (with degrees distributions for each category fixed).
is called the modularity matrix.
The normalized modularity (normalized by its maximum value when all edges fall between alike edges is called an assortativity coefficient.
Assortative mixing by scalar characteristics
By scalar charactersitics we means does that have a metric that gives a notion of closeness so that two nodes can be approximately alike (age, etc.).
Measure by a Pearson coefficient (i.e. a normalized covariance) for the correlation of the value of the scalar at the two ends of the edge. The covariance turns out to be:
and one can divide by its max value to get an assortativity coefficient.
If positive assortativity, sometimes the network is said to be stratified. Others nonlinear kinds of correlations may not be detected by the Pearson coefficient (for example low and high being more often connected with intermediate ). Other information theoretic measures may then be used, or a scatter plot of vs for visual insight, as in figure below:
Note that in this figure, the values, 9, 10, 11, 12 are bins, and the positions of points (which represent edges, or pairs of nodes) within each bin is just used to visually aid in identifying blocks with more density.
Assortative mixing by degree
Degree is special case of scalar because degrees may be close to one another (using usual distance function on integers), so use same formula.
If a network shows assortative mixing by degree, it often displays a core (with high density of nodes) and a periphery (with low) structure See (a). If it shows dissasortative mixing by degree, it often shows star like features and is more uniform See (b).
There appears to be another definition of a quantity called assortativity in this review.
A network partition with exhibits "assortative mixing" |
A network partition with exhibits "disassortative mixing" |
Can also rewrite the assortativity coefficient in this case as a Pearson coefficient for the distribution of the "excess degree" of nodes (i.e. follow an edge to a node and look at distribution of remaining stubs). See page 5 in notes.
Notes:
Given some network and two partitions (assignment of nodes to categories), we can calculate their modularities, and find which is "more modular"
Maximizing is a good way of finding "communities" of densely-connected nodes with sparse connections between those sets.
Can define scalar measure of assortativity. See page 3 in notes.