Personal tools
Start Here




Nonlinear scaling limit to the power-law dependency network of free software packages (Download)

Arnab K. Ray, Rajiv Nair, and G. Nagarjuna

Homi Bhabha Centre for Science Education, Tata Institute of Fundamental Research, V. N. Purav Marg, Mankhurd, Mumbai 400088  

It is a matter of common knowledge that when it comes to installing a software package from the free-software Debian GNU/Linux repository, many other packages — the “dependencies” — are also called for as prerequisites. This leads to a network of these dependencies. We treat every such package as a node in a network of dependency relationships. Each dependency relationship connecting any two packages (nodes) is treated as a link. Links can further be of two types — incoming links and outgoing links, giving rise to two distinct kinds of network. The task is to measure the number of software packages, ϕ, which are connected by a particular number of links, x, in either kind of network. In other words, this gives a frequency plot of ϕ ϕ(x) versus x. We posit a mathematical model for this kind of a distribution by a general logistic-type equation,

 d ϕ           α
xdx-= λ ϕ(1- ηϕ ) ,

with λ being a power-law exponent, α being a saturation exponent, and η being a “switch” parameter for nonlinearity. Determination of the magnitude of both α and η should be important in understanding the global behaviour of the distribution of networks, particularly the distribution of a disproportionately high number of links connected to a very few special nodes — the so-called “top nodes”.

Integration of Eq. (1), which is a nonlinear differential equation, yields the general integral solution (for α0),

      [   (  )   ]-1∕α
ϕ(x) = η+   x--αλ      ,

in which c is an integration constant. It is quite obvious that when η = 0, which implies the absence of nonlinearity, there will be a global power-law distribution for the data, going as ϕ(x) ~ xλ. This has been shown by the continuous straight lines in both the log-log plots in Figs. 1 & 2, for the network of incoming and outgoing links, respectively. However, as x-→, nonlinearity causes a convergence of ϕ towards a non-zero limiting value η, i.e. ϕ-→η. Under the calibrated values of α = -1 and λ = -2, as derived from the data for both types of network, this particular situation is described by the solution

ϕ(x) = x2 + η,

a result that very strongly indicates that nonlinearity characterises the type of the network, influences the properties of its top nodes (corresponding to high values of x), and restricts the number of links connected to these special nodes. A scale for the onset of nonlinear effects can also be ascertained by requiring the two terms on the right hand side of Eq. (3) to be in rough equipartition. This will yield the nonlinear scale as xnl ~ c|η|-12. The Debian data indicate that roughly the top 1% of the nodes fall within this scale. On the other hand, the very weakly linked nodes (for small values of x) are best modelled by a Gibbs-Boltzmann or a log-normal distribution.

We also note here that following a power series expansion of Eq. (2), a natural and self-consistent truncation for the series can only be achieved for α = -1, leading to the result shown in Eq. (3). This is also the only value of α that will spare ϕ(x) from the occurrence of divergences or unphysical solutions.

FIG. 1: For the network of incoming links, the intermediate nodes show a good fit with a power law of exponent -2 (as indicated by the continuous straight line). However, for large values of x, there is a saturation behaviour towards a limiting scale that is modelled well with the nonlinearity parameter, η = -6.

FIG. 2: For the network of outgoing links, the intermediate nodes are again fitted well by a power-law of exponent -2, but the saturation behaviour of the top nodes is very different from that of the network of incoming links. There is a clear convergence of ϕ towards a limit given by η = 1. Thus, all other parameters (α and λ) remaining the same, nonlinearity (quantified by η) becomes a deciding factor in determining the character of a dependency network.