keyplayer: An R Package for Locating Key Players in Social Networks

Interest in social network analysis has exploded in the past few years, partly thanks to the advancements in statistical methods and computing for network analysis. A wide range of the methods for network analysis is already covered by existent R packages. However, no comprehensive packages are available to calculate group centrality scores and to identify key players (i.e., those players who constitute the most central group) in a network. These functionalities are important because, for example, many social and health interventions rely on key players to facilitate the intervention. Identifying key players is challenging because players who are individually the most central are not necessarily the most central as a group due to redundancy in their connections. In this paper we develop methods and tools for computing group centrality scores and for identifying key players in social networks. We illustrate the methods using both simulated and empirical examples. The package keyplayer providing the presented methods is available from Comprehensive R Archive Network (CRAN).

Weihua An (Departments of Statistics and Sociology, Indiana University) , Yu-Hsin Liu (Kelley School of Business, Indiana University)
2016-05-01

1 Introduction

Interest in social network analysis has grown rapidly in the past few years. This was due partly to the advancements in statistical methods and computing for network analysis and partly to the increasing availability of social network data (e.g., network data generated by social media). A wide range of the methods for network analysis is already covered by R packages such as network (Butts 2008a), sna (Butts 2008b), igraph (Csardi and T. Nepusz 2006), statnet (Handcock et al. 2008), RSiena (Ripley et al. 2013), etc. However, none of these packages provides a comprehensive toolbox to calculate group centrality measures and to identify key players, who constitute the most central group, in a network. Determining the key players in a network is very important because many social and health interventions rely on key players to facilitate the intervention. For example, (Kelly et al. 1991) and (Latkin 1998) trained peer leaders as educators to promote HIV prevention. (Campbell et al. 2008) and (An 2015) used peer leaders to facilitate smoking prevention. (Borgatti 2006) and (Ressler 2006) suggested removing key figures among terrorists to most widely disrupt terrorism. More examples of this sort can be found in (Valente and Pumpuang 2007), (Banerjee et al. 2013), etc. Identifying key players is challenging because players who are individually the most central are not necessarily the most central as a group due to redundancy in their connections. In a seminal paper, (Borgatti 2006) pointed out the problem and proposed methods for identifying key players in social networks.

To the best of our knowledge, the keyplayer function in UCINET (Borgatti et al. 2002) is the first implementation of the methods detailed in (Borgatti 2006). It has evolved from a separate add-on to UCINET to a built-in function UCINET. In this paper, we present the keyplayer package (An and Liu 2016) in R, which differs from the keyplayer function in UCINET in several aspects. (1) Unlike the keyplayer function in UCINET which is only applicable to binary networks, keyplayer in R can be used for both binary and weighted networks. (2) The keyplayer package includes more centrality measures for choosing key players than what is currently available in the keyplayer function in UCINET. (3) keyplayer provides better integration with other open-source packages in R. Overall, the keyplayer function in UCINET is useful for researchers who are more familiar with UCINET and would like to utilize other functionalities provided by UCINET, whereas keyplayer is designed for users who are more familiar with R and who plan to do more computational work.

The influenceR package (Simon and Aditya 2015) aims to provide calculations of several node centrality measures that were previously unavailable in other packages, such as the constraint index (Burt 1992) and the bridging score (Valente and Fujimoto 2010). It can also be used to identify key players in a network. But in comparison to keyplayer, it utilizes only one centrality metric when selecting key players whereas keyplayer includes eight different metrics. Also, influenceR currently works only for undirected networks whereas keyplayer works for both undirected and directed networks. Both packages provide parallel computation. influenceR relies on OpenMP for parallel computation whereas keyplayer utilizes the base package parallel which is readily available in R. Last, influenceR focuses on computing centrality measures at the node level whereas keyplayer is more interested in providing centrality measures at the group level. Overall, keyplayer provides more comprehensive functionalities for calculating group centrality measures and for selecting key players.

The algorithm for identifying key players in package keyplayer essentially consists of three steps. First, users choose a metric to measure centrality in a network. Second, the algorithm (specifically the kpcent function) will randomly pick a group of players and measure their group centrality. Third, the algorithm (specifically the kpset function) will select the group of players with the highest group centrality as the desired key players. In general, users only need to employ the kpset function by specifying a centrality metric and the number of key players to be selected. The function will return a set of players who are the most central as a group. We also make the auxiliary function kpcent available. If users specify a centrality metric and the indices of a group of players, this function will return the centrality score of the specified group. Thus the two functions can be used for two purposes: selecting key players or measuring group centrality.

The paper proceeds as follows. First, we review centrality measures at the individual level. Then we present methods for measuring centrality at the group level. After that, we present a greedy search algorithm for selecting key players and outline the basic structure and the usage of the main function kpset in package keyplayer. To illustrate the methods and the usage of the package, we use a simulated network as well as an empirical example based on the friendship network among managers in a company. Last, we summarize and point out directions for improving the package in the future.

2 Measuring individual centrality

We first review the definitions of centrality measures at the individual level. For conciseness, we provide the definitions based on weighted networks, where the weight of a tie takes a continuous value and usually measures the strength of the connection between two nodes. The definitions naturally incorporate binary networks where the weight of a tie can only be one or zero, indicating the presence or absence of a connection (Freeman 1978; Wasserman and Faust 1994; Butts 2008b).

Figure 1 shows an example of a simulated network. On the left is the adjacency matrix of the network. On the right is the network graph. Thinking of it as a friendship network, we can see that the strength of friendship between node 1 and node 3 is conceived differently by node 1 and node 3. The former assigns it a weight of 3 while the latter assigns it a weight of 1. We will use this example to illustrate the centrality measures. Calculations of four centrality measures (i.e., degree, closeness, betweenness, and eigenvector centralities) at the individual level are done using the sna package (Butts 2008b). Calculations of four other individual level centralities and all group level centralities are done using our package keyplayer. We would like to clarify at this point that our package does not depend on sna. We use sna here just for the sake of the example.

\[W=\begin{bmatrix} 0 & 1 & 3 & 0 & 0 \\ 0 & 0 & 0 & 4 & 0 \\ 1 & 1 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 3 \\ 0 & 2 & 0 & 0 & 0 \\ \end{bmatrix}\]