If we know the structure of a network, then we can calculate a number of quantities or measures that capture features of the network topology (and geometry). Originally, a lot of these ideas were developed for social network analysis, but they are used elsewhere now too.
Trying to answer: "Which are the most important or central vertices (or edges, or other substructures) in a network?"
Degree centrality
Simply the degree of a vertex can be used as a measure of its centrality.
The eigenvector centrality (first defined by Bonacich in 1987), is defined by:
where is the vector of centralities, and is the largest eigenvalue of
A node can be important because it is connected to many nodes, or because it is connected to important nodes, or both.
Katz centrality solves the problem posed above by giving all vertices a "free" centrality:
There is one potentially undesirable feature of Katz centrality. An important vertex pointing to many vertices makes all those vertices important. The centrality gained by virtue of receiving an edge from a prestigious vertex is diluted by being shared with so many others (think a web directory like Google or Yahoo! pointing to my page. My page is not that central because it's just one of millions). We can solve this by making the centrality derived from neighbours be divided by their out degree:
which is the basis for PageRank
Hubs and authorities (Network theory)
One can distinguish two types of important nodes in directed networks. We describe them for the case of an information network, like WWW first:
This idea was implemented by Kleinberg into the hyperlink-induced topic search or HITS algorithm.
Closeness centraliy of node i is the mean geodesic distance to all others nodes in he network. A variant is exponentially weighted closeness centrality:
where is the geodesic distance between node and ; and is the connected network component reachable from (except for ).
Main disadvantage is its often very low dynamic range (range of values it takes)
There are also problems when there are disconnected components. One way is to define closeness centrality over only connected nodes, or to use harmonic mean (mean of reciprocals, ignoring self distance, as it's 0).
Measures the extent to which a node (or edge, or other substructure) lies on paths between other vertices. These paths can be defined in many ways, but often they are taken to be geodesic paths.
Many networks naturally divide into groups. These are substructures that are prominent for some reason. Simple examples are cliques, plexes and cores. There are also generalizations of components called k-components.
Transitivity
Transitivity (a property of mathematical relations) in a network is usually applied to the relation "is connected by an edge". So a network is transitive if for every u connected to v and v connected to w, then u is connected to w. One can define the clustering coefficient, , as a measure of "how often" transitivity holds in the network:
Reciprocity
For directed simplest graph the smallest loop size is two, instead of three, and thus one often measures the frequency of length-2 loops. This is called reciprocity (see Transitivity for more comments). Pairs of reciprocated edges (that is edges from i to j where there is also one from j to i) are sometimes called co-links. The reciprocity is defined as the fraction of edges that are reciprocated, and this turns out to equal .
Signed edges and structural balance
Signed networks have signed edges, that is their edges can have an associated weight (like friendship) or (like animosity).
Structural balance refers to the situation when a Signed network contains only loops with even numbers of minus signs.
How can we measure the "similarity" of two nodes (or edges, etc.)? Two main approaches. Two nodes may be:
Homophily or assortative mixing
Homophily or assortative mixing is a bias in favour of connections between network nodes with some similar characteristics.