Interest in social network analysis has exploded in the past few years, partly thanks to the advancements in statistical methods and computing for network analysis. A wide range of the methods for network analysis is already covered by existent R packages. However, no comprehensive packages are available to calculate group centrality scores and to identify key players (i.e., those players who constitute the most central group) in a network. These functionalities are important because, for example, many social and health interventions rely on key players to facilitate the intervention. Identifying key players is challenging because players who are individually the most central are not necessarily the most central as a group due to redundancy in their connections. In this paper we develop methods and tools for computing group centrality scores and for identifying key players in social networks. We illustrate the methods using both simulated and empirical examples. The package keyplayer providing the presented methods is available from Comprehensive R Archive Network (CRAN).
Interest in social network analysis has grown rapidly in the past few years. This was due partly to the advancements in statistical methods and computing for network analysis and partly to the increasing availability of social network data (e.g., network data generated by social media). A wide range of the methods for network analysis is already covered by R packages such as network (Butts 2008a), sna (Butts 2008b), igraph (Csardi and T. Nepusz 2006), statnet (Handcock et al. 2008), RSiena (Ripley et al. 2013), etc. However, none of these packages provides a comprehensive toolbox to calculate group centrality measures and to identify key players, who constitute the most central group, in a network. Determining the key players in a network is very important because many social and health interventions rely on key players to facilitate the intervention. For example, (Kelly et al. 1991) and (Latkin 1998) trained peer leaders as educators to promote HIV prevention. (Campbell et al. 2008) and (An 2015) used peer leaders to facilitate smoking prevention. (Borgatti 2006) and (Ressler 2006) suggested removing key figures among terrorists to most widely disrupt terrorism. More examples of this sort can be found in (Valente and Pumpuang 2007), (Banerjee et al. 2013), etc. Identifying key players is challenging because players who are individually the most central are not necessarily the most central as a group due to redundancy in their connections. In a seminal paper, (Borgatti 2006) pointed out the problem and proposed methods for identifying key players in social networks.
To the best of our knowledge, the keyplayer
function in UCINET
(Borgatti et al. 2002) is the first implementation of the methods detailed in
(Borgatti 2006). It has evolved from a separate add-on to UCINET to a
built-in function UCINET. In this paper, we present the
keyplayer package
(An and Liu 2016) in R, which differs from the keyplayer
function in
UCINET in several aspects. (1) Unlike the keyplayer
function in
UCINET which is only applicable to binary networks, keyplayer in R
can be used for both binary and weighted networks. (2) The keyplayer
package includes more centrality measures for choosing key players than
what is currently available in the keyplayer
function in UCINET. (3)
keyplayer provides better integration with other open-source packages
in R. Overall, the keyplayer
function in UCINET is useful for
researchers who are more familiar with UCINET and would like to
utilize other functionalities provided by UCINET, whereas keyplayer
is designed for users who are more familiar with R and who plan to do
more computational work.
The influenceR package (Simon and Aditya 2015) aims to provide calculations of several node centrality measures that were previously unavailable in other packages, such as the constraint index (Burt 1992) and the bridging score (Valente and Fujimoto 2010). It can also be used to identify key players in a network. But in comparison to keyplayer, it utilizes only one centrality metric when selecting key players whereas keyplayer includes eight different metrics. Also, influenceR currently works only for undirected networks whereas keyplayer works for both undirected and directed networks. Both packages provide parallel computation. influenceR relies on OpenMP for parallel computation whereas keyplayer utilizes the base package parallel which is readily available in R. Last, influenceR focuses on computing centrality measures at the node level whereas keyplayer is more interested in providing centrality measures at the group level. Overall, keyplayer provides more comprehensive functionalities for calculating group centrality measures and for selecting key players.
The algorithm for identifying key players in package keyplayer
essentially consists of three steps. First, users choose a metric to
measure centrality in a network. Second, the algorithm (specifically the
kpcent
function) will randomly pick a group of players and measure
their group centrality. Third, the algorithm (specifically the kpset
function) will select the group of players with the highest group
centrality as the desired key players. In general, users only need to
employ the kpset
function by specifying a centrality metric and the
number of key players to be selected. The function will return a set of
players who are the most central as a group. We also make the auxiliary
function kpcent
available. If users specify a centrality metric and
the indices of a group of players, this function will return the
centrality score of the specified group. Thus the two functions can be
used for two purposes: selecting key players or measuring group
centrality.
The paper proceeds as follows. First, we review centrality measures at
the individual level. Then we present methods for measuring centrality
at the group level. After that, we present a greedy search algorithm for
selecting key players and outline the basic structure and the usage of
the main function kpset
in package keyplayer. To illustrate the
methods and the usage of the package, we use a simulated network as well
as an empirical example based on the friendship network among managers
in a company. Last, we summarize and point out directions for improving
the package in the future.
We first review the definitions of centrality measures at the individual level. For conciseness, we provide the definitions based on weighted networks, where the weight of a tie takes a continuous value and usually measures the strength of the connection between two nodes. The definitions naturally incorporate binary networks where the weight of a tie can only be one or zero, indicating the presence or absence of a connection (Freeman 1978; Wasserman and Faust 1994; Butts 2008b).
Figure 1 shows an example of a simulated network. On the left is the adjacency matrix of the network. On the right is the network graph. Thinking of it as a friendship network, we can see that the strength of friendship between node 1 and node 3 is conceived differently by node 1 and node 3. The former assigns it a weight of 3 while the latter assigns it a weight of 1. We will use this example to illustrate the centrality measures. Calculations of four centrality measures (i.e., degree, closeness, betweenness, and eigenvector centralities) at the individual level are done using the sna package (Butts 2008b). Calculations of four other individual level centralities and all group level centralities are done using our package keyplayer. We would like to clarify at this point that our package does not depend on sna. We use sna here just for the sake of the example.
\[W=\begin{bmatrix} 0 & 1 & 3 & 0 & 0 \\ 0 & 0 & 0 & 4 & 0 \\ 1 & 1 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 3 \\ 0 & 2 & 0 & 0 & 0 \\ \end{bmatrix}\] |