Nonlinear scaling limit to the power-law dependency network of free software packages (Download)

Arnab K. Ray, Rajiv Nair, and G. Nagarjuna

Homi Bhabha Centre for Science Education, Tata Institute of Fundamental Research, V. N. Purav Marg, Mankhurd, Mumbai 400088

It is a matter of common knowledge that when it comes to installing a software package from the free-software Debian GNU/Linux repository, many other packages — the “dependencies” — are also called for as prerequisites. This leads to a network of these dependencies. We treat every such package as a node in a network of dependency relationships. Each dependency relationship connecting any two packages (nodes) is treated as a link. Links can further be of two types — incoming links and outgoing links, giving rise to two distinct kinds of network. The task is to measure the number of software packages, ϕ, which are connected by a particular number of links, x, in either kind of network. In other words, this gives a frequency plot of ϕ ≡ ϕ(x) versus x. We posit a mathematical model for this kind of a distribution by a general logistic-type equation,

| (1) |

with λ being a power-law exponent, α being a saturation exponent, and η being a “switch” parameter for nonlinearity. Determination of the magnitude of both α and η should be important in understanding the global behaviour of the distribution of networks, particularly the distribution of a disproportionately high number of links connected to a very few special nodes — the so-called “top nodes”.

Integration of Eq. (1), which is a nonlinear differential equation, yields the general integral solution (for α≠0),

| (2) |

in which c is an integration constant. It is quite obvious that when
η = 0, which implies the absence of nonlinearity, there will
be a global power-law distribution for the data, going as
ϕ(x) ~ x^{λ}. This has been shown by the continuous straight lines
in both the log-log plots in Figs. 1 & 2, for the network of
incoming and outgoing links, respectively. However, as x-→∞,
nonlinearity causes a convergence of ϕ towards a non-zero
limiting value η, i.e. ϕ-→η. Under the calibrated values
of α = -1 and λ = -2, as derived from the data for both
types of network, this particular situation is described by the
solution

| (3) |

a result that very strongly indicates that nonlinearity characterises
the type of the network, influences the properties of its top
nodes (corresponding to high values of x), and restricts the
number of links connected to these special nodes. A scale for
the onset of nonlinear effects can also be ascertained by
requiring the two terms on the right hand side of Eq. (3) to be
in rough equipartition. This will yield the nonlinear scale
as x_{nl} ~ c|η|^{-1∕2}. The Debian data indicate that roughly
the top 1% of the nodes fall within this scale. On the other
hand, the very weakly linked nodes (for small values of x)
are best modelled by a Gibbs-Boltzmann or a log-normal
distribution.

We also note here that following a power series expansion of Eq. (2), a natural and self-consistent truncation for the series can only be achieved for α = -1, leading to the result shown in Eq. (3). This is also the only value of α that will spare ϕ(x) from the occurrence of divergences or unphysical solutions.