Matching is a well known technique to balance covariates distribution between treated and control units in nonexperimental studies. In many fields, clustered data are a very common occurrence in the analysis of observational data and the clustering can add potentially interesting information. Matching algorithms should be adapted to properly exploit the hierarchical structure. In this article we present the CMatching package implementing matching algorithms for clustered data. The package provides functions for obtaining a matched dataset along with estimates of most common parameters of interest and modelbased standard errors. A propensity score matching analysis, relating math proficiency with homework completion for students belonging to different schools (based on the NELS88 data), illustrates in detail the use of the algorithms.
Causal inference with observational data usually requires a preliminary stage of analysis corresponding to the design stage of an experimental study. The aim of this preliminary stage is to reduce the imbalance in covariates distribution across treated and untreated units due the nonrandom assignment of treatment before estimating the parameters of interest. Matching estimators are widely used for this task (Stuart 2010). Matching can be done directly on the covariates (multivariate matching) or on the propensity score (Rosenbaum and Rubin 1983). The latter is defined as the probability of the treatment given the covariates value and it has a central role for the estimation of causal effects. In fact, the propensity score is a one dimensional summary of the covariates and thus it mitigates the difficulty of matching observations in high dimensional spaces. Propensity score methods have flourished and several techniques are now well established both in theory and in practice, including stratification on the propensity score, propensity score weighting (PSW), and propensity score matching (PSM).
Whilst the implementation of matching techniques with unstructured data has became a standard tool for researchers in several fields (Imbens and Rubin 2016), the increasing availability of clustered (nested, hierarchical) observational data poses new challenges. In a clustered observational study individuals are partitioned into clusters and the treatment is nonrandomly assigned in each cluster so that confounders may exist both at the individual and at the cluster level. Note that this framework is different from clustered observational data where a treatment is nonrandomly assigned for all units in the cluster, for which an optimal matching strategy has been suggested by Zubizarreta and Keele (2017). Such nested data structures are ubiquitous in the health and social sciences where patients are naturally clustered in hospitals and students in schools, just to make two notable examples. If relevant confounders are observed at both levels then a standard analysis, adjusting for all confounders, seems reasonable. However, when only the cluster label — but not the cluster level variables — is observed there is not a straightforward strategy to exploit the information on the clustering. Intuitively, the researcher having a strong belief on the importance of the cluster level confounders may adopt a ‘withincluster’ matching strategy. On the other extreme, a researcher may decide to ignore the clustering by using only the pooled data. It is important to note that this pooling strategy implicitly assumes that cluster level variables are not important confounders. Indeed, there have been a few proposals to adapt PSW and PSM to clustered data, see Cafri et al. (2018) for a review. Li et al. (2013) proposed several propensity score weighting algorithms for clustered data showing, both analytically and by simulation, that they reduce the bias of causal effects estimators when "clusters matter," that is, when cluster level covariates are important confounders. In the PSM context, Arpino and Mealli (2011) proposed to account for the clustering in the estimation of the propensity score via multilevel models. Recently, Rickles and Seltzer (2014) and Arpino and Cannas (2016) proposed caliper matching algorithms to perform PSM with clustered data. As we will discuss shortly, these algorithms can be used not only for PSM but also in the more general context of multivariate matching.
In the remaining of this paper, after reviewing the basic ideas underlying matching estimators, we briefly describe the available packages for matching in the R environment. Then, we describe the algorithms for matching with clustered data proposed by Arpino and Cannas (2016) and we present the package CMatching implementing these algorithms. The applicability of these algorithms is very broad and refers to all situations where clusterlevel data are present (in medicine, epidemiology, economics, etc.). A section is devoted to illustrate the use of the package on data about students and schools, which is a common significant occurrence of clustered data.
A list of the most important packages for matching available for R users
is shown in Table 1. The
Matching package, which
is required to run CMatching, is a remarkably complete package
offering several matching options. Matching
implements many greedy
matching algorithms including genetic matching (Diamond and Sekhon 2013). It also
contains a general MatchBalance
function to measure pre and
postmatching balance with a large suite of diagnostics. As for optimal
matching, there are dedicated packages like
designmatch and
optmatch. The latter
can also be called from
MatchIT, a general
purpose package implementing also the Coarsened Exact Matching approach
of (Iacus et al. 2011). Full matching is a particular form of optimal matching
implemented by
quickmatch with
several custom options.
Package 
Description 
Reference 


Matching  Greedy matching and balance analysis  (Sekhon 2011)  
MatchIT  Greedy matching and balance analysis  (Iacus et al. 2011)  
optmatch  Optimal matching  (Hansen and Klopfer 2006)  
quickmatch  Generalized full matching  (Savje et al. 2018)  
designmatch  Optimal matching and designs  (Zubizarreta et al. 2018) 
At the time of writing none of the packages described above offers specific routines for clustered data. The CMatching package fills this important gap implementing the algorithms for matching clustered data described in the next section.
Let us consider a clustered data structure \({\mathfrak D} = \{ y_{i j}, x_{i j} , t_{i j} \}, \quad i=1,\cdots n_j, \quad j=1,\cdots J\). For observation \(i\) in cluster \(j\) we observe a vector of covariates \(X\) and a binary variable \(T\) specifying the treatment status of each observation. Here \(n=\sum n_j\) is the total number of observations and \(J\) is the number of clusters in the data. We observe also a response variable \(Y\) whose average value we are willing to compare across treated and untreated units. A matching algorithm assigns a (possibly empty) subset of control units to each treated unit. The assignment is made with the aim of minimizing a loss function, typically expressed in terms of covariates distance between treated and untreated units. Matching algorithms can be classified as greedy or optimal depending whether the cost function is minimized locally or globally, respectively. Optimal matching algorithms are not affected by the order of the units being matched so they can reach the global optimum, but they are typically more computerintensive than greedy algorithms proceedings step by step. To bound the possibility of bad matches in greedy matching, it is customary to define a maximum distance for two units to be matched, i.e., a caliper, which is usually expressed in standard deviation units of the covariates (or of the propensity score). A greedy matching procedure can then be articulated in the following steps:
fix the caliper;
match each treated with control unit(s) at minimum covariates distance (provided that distance < caliper);
measure the residual covariates’ imbalance in the matched dataset and count the number of unmatched units (drops);
carefully consider both the balance and the number of drops: if they are satisfactory then proceed to the outcome analysis; otherwise stop or revise previous steps.
If matching proves successful in adjusting covariates, the researcher can proceed to outcome analysis where the causal estimand of interest is estimated from the matched data using a variety of techniques (Ho et al. 2007). On the other hand, if the procedure gives either an unsatisfactory balance or an excessive number of unmatched units, the investigator may try to modify some aspects of the procedure (e.g., the caliper, the way the distance is calculated).
Conceptually, the same procedure can be used also for hierarchical data. Indeed, it is not atypical to find analysis ignoring the clustering and pooling together all units. A pooling strategy implicitly assumes that the clustering is not relevant. However, in several cases the clustering does matter, that is, the researcher can hypothesize or suspect that some important clusterlevel confounders are unobserved. In this case, the information on the cluster labels can be exploited in at least two ways: i) forcing the matching to be implemented withincluster only; ii) performing a preferential withincluster matching, an intermediate approach between the two extremes of pooled and withincluster matching (Arpino and Cannas 2016). A withincluster matching can be obtained by modifying step 2 above in the following way:
This procedure may result in a large number of unmatched units (drops) so it increases the risk of substantial bias due to incomplete matching (Rosenbaum and Rubin 1985), in particular when the clusters are small. This particular bias arises when a matched subset is not representative of the original population of treated units because of several non random drops. Even in the absence of bias due to incomplete matching, a high number of drops reduces the original sample size with possible negative consequences in terms of higher standard errors.
It is possible to profit as much as possible of the the exact balance of (unobserved) clusterlevel covariates by first matching within clusters and then recovering some unmatched treated units in a second stage. This leads to the preferential withincluster matching, which can be obtained by modifying step 2 above in the following way:
a) match each treated with the control units(s) in group \(j\) at minimum covariate distance (distance < caliper);
b) match each unmatched treated unit from previous step with the control unit(s) at minimum covariate distance in some group different from \(j\) (provided that distance < caliper).
Now consider the outcome variable \(Y\). We can define for each unit potential outcomes \(Y1,\, Y0\) as the outcome we would observe under assignment to the treatment and control group, respectively (Holland 1986). Causal estimands of interest are the Average Treatment effect: \(ATE= E[Y1Y0]\) or, more often, the Average Treatment effect on the treated: \(ATT= T E[Y1Y0]\). Given that a unit is either assigned to the treatment or control group it is not possible to directly observe the individual causal effect on each unit; we have \(Y=T\cdot Y1+(1T)\cdot Y0\). In a randomized study \(T\) is independent of \((Y0,Y1)\) so, for \(k=0,1\), we have \[E(Yk)=E(Yk\, \, T=k)=E(Y \, \, T=k)\] which can be estimated from the observed data. In a observational study, matching can be used to balance covariates across treated and control units and then the previous relation can be exploited to impute the unobserved potential outcomes from the matched dataset. In our clustered data context, after the matched dataset has been created using one of the algorithms above, the ATT and its standard error can be estimated using a simple regression model: \[Y_{ij}=\alpha_{j} + T_{ij}\beta \label{model} \tag{1}\] that is, a linear regression model with clustered standard errors to take into account withincluster dependence in the outcome (Arpino and Cannas 2016). The resulting ATT estimate is the difference of outcome means across treated and controls, i.e., \(\widehat{ATT}\coloneqq mean(Y \,  \, T=1)  mean(Y \,  \, T=0)\), computed on the matched data. Standard errors are calculated using the cluster bootstrapping method for estimating variancecovariance matrices proposed by Cameron et al. (2011) and implemented in the package multiwayvcov. In general, calculating standard errors for PSM in clustered observational studies is a difficult problem requiring prudence from the researcher. While close formulae exist for weighting estimators (Li et al. 2013), standard error estimation after PSM matching relies upon approximation (Cafri et al. 2018), modelling assumption (Arpino and Cannas 2016), or simulation (Rickles and Seltzer 2014).
In this section we briefly detail the two routines in the CMatching package. Multisets are useful to compactly describe pseudo code so we recall some definitions and basic properties herein. A multiset is a pair \(\{U,m\}\) where \(U\) is a given set, usually called universe, and \(m:x\rightarrow \mathbb{N}\cup\{0\}\) is the multiplicity function assigning each \(x\in U\) its frequency in \(U\). Both the summation symbol and union symbols are used to manipulate multisets and they have different meanings: if \(A\) and \(B\) are multisets then \(C\equiv A \cup B\) is defined by \(m_C(x)=max(m_A(x),m_B(x))\), while \(C\equiv A+B\) is defined by \(m_C(x)=m_A(x)+m_B(x)\). For example if \(A\equiv\{1,2,2\}\) and \(B\equiv\{1,2,3\}\) then \(A\cup B\equiv \{1,2,2,3\}\), \(A+B\equiv \{1,1,2,2,2,3\}\). In our framework \(U\) is the set of observations indexes and thus \(m\) gives information about the number of times a given observation occurred in the matched dataset. Multisets then allow to naturally represent multiple matches arising from matching with replacement. When using multiset notation to describe the output of a matching algorithm, we are implicitly overlooking the fact that the output of a matching algorithm is richer as it also brings out the pairings, i.e., the associations between matched treated and untreated observations. However, it can be noted that this pairing is not relevant for calculating common estimates or common balance measures (e.g., the ATT), as they are invariant to permutations of the labels of the matched observations.
MatchW
and MatchPW
We denote with \(W_i\) and \(\overline{W}_i\) the sets of treated observations matched within clusters and unmatched within clusters, respectively. In Algorithms 1 and 2 the summation symbol \((\sum)\) denotes multiset sum.
In the first two lines, common to both algorithms, the Match
function
is repeatedly run to produce the matchedwithin subsets \(M_i\)
\(i=,...,J\). Then, in algorithm 1 the sum of the \(M_i\) in line
3 gives the matched subset \(M\). Algorithm 2 is similar but
after finding the \(M_i\)’s an "additional" subset \(B\) is found by
recovering some unmatched units (line 3) and then combined to give the
final matched dataset. If a response variable \(Y\) was included the
output of both algorithms also contains an estimate of the ATT (default,
but the user can choose also other estimands) and its standard error.
CMatching can be freely downloaded from CRAN repository and it
contains the functions listed in Table 2. The main function
CMatch
performs withincluster and preferential withincluster
matching via subfunctions MatchW
and MatchPW
, respectively. The
output of the main function can be passed to functions CMatchBalance
and summary
to provide summaries of covariates balance and other
characteristics of the matched dataset. CMatch
exploits the Match
function (see Matching) implementing matching for unstructured data.
Given a covariate X
and a binary treatment T
, the call
Match(X, T, ...)
gives the set of indexes of matched treated and
matched control units. The CMatch
function has the same arguments plus
the optional arguments G
(specifying the cluster variable) and type
to choose between withincluster matching or preferentialwithincluster
matching. We highlight that we chose to frame the CMatch in the
Match
function style so that Matching
users can easily implement PSM
with clustered data in a familiar setting.
Function  Description  Input  Output 

CMatch 
Match  \(X,\,T,\, G\)  A matched dataset 
MatchW 
Match within  \(X,\,T,\, G\)  A matched dataset 
MatchPW 
Match preferentially within  \(X,\,T,\, G\)  A matched dataset 
summary.Match 
S3 method for CMatch objects  A matched dataset  General summaries 
CMatchBalance 
Balance analysis  A matched dataset  Balance summaries 
For an illustration let us consider an artificial dataset consisting of
two clusters, the first containing two treated units and two control
units, and the second containing two treated and four controls. We use
g
for the cluster identifier, x
for the value of the individual
level confounder, and t
for the binary treatment indicator:
> toy
1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[,1 2 3.0 4.0 5.0 6 7 8 9.0 10.0
id 1 1 2.0 2.0 1.0 1 2 2 2.0 2.0
g 1 1 1.0 1.0 0.0 0 0 0 0.0 0.0
t 1 1 1.1 1.1 1.4 2 1 1 1.3 1.3 x
We also fix a caliper of \(2\) (in standard deviation units of \(X\), i.e., all units at distance greater or equal than \(2\cdot sd(x)\approx 2\cdot 0.312 = 0.624\) will not be matched together) and we assume that the \(ATT\) is the target parameter. Although artificial the dataset aims at representing the general situation where pooled matching results in matched treated and control units not distributed homogeneously across clusters (see Figure 1, left).
The pooled, within and preferentialwithin matchings for the toy data
are depicted in Figure 1. For each matching we report the
asam
, a measure of residual imbalance given by the absolute percent
mean difference of \(x\) across treated and controls divided by standard
deviation of the treated observations alone. The asam is widely used as
a measure of imbalance (Stuart 2010); its value in the unmatched data is
\(491\). The pooled matching (left) is a complete matching, i.e., all the
treated units could be matched. However, note that units in pairs 17
and 28 may differ in cluster level covariates. Matching withincluster
(center) guarantees perfect balance in cluster level covariates but it
is incomplete because unit 2 cannot be matched withincluster with the
given caliper. This is a typical disadvantage of withincluster matching
with smaller clusters. Unit 2 is matched with 9 in the preferential
within matching (right), which again is a complete matching.
The above matching solutions can be obtained easily using CMatch
as
follows. For the pooled matching it is enough to call Match
(or,
equivalently, CMatch
without type
specification) while for within
and preferentialwithin matching it is enough to specify the appropriate
type
in the Match
call:
#pooled matching
< Match(Y=NULL, Tr=t, X=x, caliper=2)
pm
# same output as before (with a warning about the absence of groups,
# ties=FALSE,replace=FALSE)
< CMatch(type="within", Y=NULL, Tr=t, X=x, Group=NULL, caliper=2) pm
#within matching
< CMatch(type="within", Y=NULL, Tr=t, X=x, Group=g, caliper=2)
wm
#preferentialwithin matching
< CMatch(type="pwithin", Y=NULL, Tr=t, X=x, Group=g, caliper=2) pwm
The output of these object is quite rich. However, a quick look at the matchings can be obtained by directly calling the index set of matched observations:
> pm$index.treated; pm$index.control
1] 1 2 5 6
[1] 7 8 10 9
[> wm$index.treated; wm$index.control
1] 1 5 6
[1] 3 7 8
[> pwm$index.treated; pwm$index.control
1] 1 2 5 6
[1] 3 7 8 10 [
Note that vertical alignments in the table above correspond to arrows in
Figure 1. With larger datasets and when multiple matches
are allowed (i.e., when replace=TRUE
) it is probably better to
summarize the output. The output objects are of class "CMatch
" and a
summary
method is available for these objects. The summary shows the
number of treated and the number of controls by group. Moreover, when
\(Y\) is not NULL it also shows the point estimate of ATT with its
modelbased estimate of the standard error:
> summary(wm)
0
Estimate... NULL
SE.........
10
Original number of observations.............. 4
Original number of treated obs...............
Original number of treated obs by group......
1 2
2 2
3
Matched number of observations............... observations (unweighted). 3
Matched number of
Caliper (SDs).......................................... 2
'exact' or 'caliper' ......... 1
Number of obs dropped by 'exact' or 'caliper' by group
Number of obs dropped by
1 2
1 0
This summary
method does not conflict with the corresponding method
for class "Match
" which is still available for objects of that
class. From the summary above (see also Figure 1, center)
we can easily see that matching within groups resulted in one unmatched
treated unit from group two. The exact pairing can be recovered from the
full output, in particular from the object mdata
containing the list
of matched and unmatched units. As we noticed in the introduction, it is
essential to analyze covariate balance to evaluate the effectiveness of
matching as a balancing tool. To this end objects of class "CMatch
"
can be input of the CMatchBalance
function, a wrapper of
MatchBalance
which offers a large number of balance diagnostics:
> CMatchBalance( t~x , match.out=wm)
***** (V1) x *****
Before Matching After Matching1.05 1.0667
mean treatment........ 1.3333 1.1333
mean control.......... 490.75 115.47
std mean diff......... (...)
From the output we see that the asam
decreased from 491 to 115. One
can more directly obtain the standardized difference in means of these
matchings by subcomponents:
> bwm$After[[1]]["sdiff"]
$sdiff
1] 115.4701
[
> bpwm$After[[1]]["sdiff"]
$sdiff
1] 216.5064 [
Whilst artificial, the previous example prompts some general considerations:
type=within
is chosen
(or it could be matched with a less similar control in the same
cluster). In both cases the increased bias (due to either incomplete
matching or greater imbalance in the observed \(x\)) may be at least
in part compensated by lower imbalance in cluster level variables;In applications, when cluster level confounders are unobserved, it is
not straightforward to decide which of the matching strategies is the
best. However, combining the within and preferentialwithin routines
offered by CMatching with sound subject matter knowledge, the
researcher can decide how much importance should be given to balance
withinclusters based on the hypothesized strength of unobserved cluster
level confounders. Note that CMatching uses the same caliper for
clusters, under the assumption that the researcher is typically
interested in estimating the causal effect of the treatment in the whole
population of treated units and not by each cluster. This is the main
difference between MatchW
and Matchby
from the Matching package.
The latter exactly matches on a categorical variable and the caliper is
recalculated in each subset and for this reason MatchW
estimates
generally differ from those obtained from Matchby.
Another difference
is that Matchby
does not adjust standard errors for withincluster
correlation in the outcome. A third difference is that CMatching
provides some statistics (e.g., number of drops) by cluster to better
appreciate how well the matched dataset resembles the original clustered
structure in terms for example of cluster size.
The CMatching package includes several functions designed for matching with clustered data. In this section we illustrate the use of CMatching with a an educational example.
The example is based on data coming from a nationally representative,
longitudinal study of 8th graders in 1988 in the US, called NELS88.
CMatching contains a selected subsample of the NELS88 data provided
by (Kraft and de Leeuw 1998) with 10 handpicked schools from the 1003 schools in the full
dataset. The subsample, named schools
, is a data frame with 260
observations on 19 variables recording either school or student’s
characteristics.
For the purpose of illustrating matching algorithms in CMatching, we
consider estimation of the causal effect of doing homework on
mathematical proficiency. More specifically, our treatment is a binary
variable taking value 1 for students that spent at least one hour
doing math homework per week and 0 otherwise. The latter is a
transformation of the original variable "homework" giving two almost
equalsized groups. The outcome is math proficiency measured on a
discrete scale ranging from 0 to 80. For simplicity we first attach the
dataset (attach(schools)
) and name the treatment and the outcome
variables as T
and Y
, respectively. The variable schid
is the
school identifier and we rename it as Group
:
> T < ifelse(homework>1, 1, 0)
> Y < math
> Group < schid
Since the NELS88 study was an observational study, we do not expect a simple comparison of math scores across treated and control students (those doing and those not doing math homework) to give an unbiased estimate of the "homework effect" because of possible studentlevel and schoollevel confounders. For the purpose of our example, we will consider the following studentlevel confounders: "ses" (a standardised continuous measure of family socioeconomic status), "sex" (1 = male, 2 = female) and "white" (1 = white, 0 other race). The NELS88 study also collected information on school characteristics. In general, a researcher needs to include all potential confounders in the matching procedure, irrespective of the hierarchical level. Here we considered one schoollevel confounder: "public" (1 = public schools, 0 = private) but it is plausible to assume that one or more relevant confounders at the schoollevel are not available. It is clear that, to make the unconfoundedness assumption more plausible, richer data should be used. For example, students’ motivation and parents’ involvement are potentially important confounders. Thus, the following estimates should be interpreted with caution.
Before illustrating the use of CMatching, it is useful to get a better understanding of the data structure. In the school dataset we have a fairly balanced number of treated and control units (128 and 132, respectively). However, in some schools we have more treated than control students, with proportion of treated ranging from 20% to 78%:
> addmargins(table(Group, T))
T0 1 Sum
Group 7472 17 6 23
7829 6 14 20
7930 12 12 24
24725 15 7 22
25456 17 5 22
25642 16 4 20
62821 15 52 67
68448 8 13 21
68493 15 6 21
72292 11 9 20
132 128 260 Sum
From the table above we can notice that the total school sample size is fairly homogeneous with the exception of one school (code = 62821) where the number of treated students (52) is considerably higher than the number of control students (15). These considerations are important for the implementation of the withincluster and preferential withincluster matching algorithms. In fact, withincluster matching can be difficult in groups where the proportion of treated units is high because there are relatively few controls that can potentially serve as a match. Preferentialwithincluster matching would be less restrictive.
This preliminary descriptive analysis is also useful to check if treated and control units are present in each group. In fact, if a group is only composed of treated or control students then withincluster matching cannot be implemented. Groups with only treated or controls should be dropped before the withincluster matching algorithm is implemented. We now describe how Cmatching can be used to implement matching in our schoolclustered dataset.
CMatching requires to estimate the propensity score before implementing the matching. Here we estimate propensity scores using a simple logistic regression with only main effects and then estimate the predicted probability for each student:
> pmodel < glm(T~ses + as.factor(sex) + white + public, family=binomial(link="logit"))
> eps < fitted(pmodel)
We do not report the output of the propensity score model because in PSM
the propensity score is only instrumental to implement the matching.
Withincluster propensity score matching can be implemented by using the
function CMatch
with the option type="within"
:
> psm_w < CMatch(type="within", Y=Y, Tr=T, X=eps, Group=Group)
The previous command implements matching on the estimated propensity
score, eps
, by using default settings of Match
(onetoone matching
with replacement and a caliper of 0.25 standard deviations of the
estimated propensity score). The output is an object of class
"CMatch
" and a customized summary method for objects of this class
gives the estimated ATT and the main features of the matched dataset:
> summary(psm_w)
5.2353
Estimate... 2.0677
SE.........
260
Original number of observations.............. 128
Original number of treated obs...............
Original number of treated obs by group......
7472 7829 7930 24725 25456 25642 62821 68448 68493 72292
6 14 12 7 5 4 52 13 6 9
119
Matched number of observations............... observations (unweighted). 120
Matched number of
Caliper (SDs).......................................... 0.25
'exact' or 'caliper' ......... 9
Number of obs dropped by 'exact' or 'caliper' by group
Number of obs dropped by
7472 7829 7930 24725 25456 25642 62821 68448 68493 72292
0 2 0 0 0 0 2 5 0 0
The summary starts reporting the original total number of students in
our sample (260), the total number of treated students (128) and how
they are distributed across the different schools. It is of utmost
importance to check how many treated units could be matched to avoid
bias from incomplete matching. For this reason the output indicates that
119 students in the treatment group found a match
("Matched number of observations
"), while the remaining 9 were
dropped. Note that the unweighted number of treated observations that
found a match ("Matched number of observations (unweighted)
") is
different because of ties. Ties management can be controlled by option
ties
of the Match
function: if ties=TRUE
when one treated
observation matches with more than one control unit all possible pairs
are included in the matched dataset and each pair is weighted equal to
the inverse of the number of matchedcontrols. If instead ties=FALSE
is used ties are randomly broken. Note that the summary reports the
number of treated matched and dropped units because it is assumed by
default the ATT is the target estimand. Then the output also reports how
the 9 unmatched treated students are distributed across schools. For
example, we notice that in one school (68448), 5 of the 13 treated
students did not find a match. This is because for these 5 students
there was no control student in the same school with a propensity score
falling within the caliper. The report also recalls the caliper, which
was set to 0.25 standard deviations of the estimated propensity score in
this example. The caliper can be set in standard deviation units using
the homonymous argument caliper
. It may be more useful to calculate
the percentage of dropped units instead of the absolute numbers. These
percentages are not reported in the summary but they can be easily
retrieved from the CMatch
output. For example, we can calculate the
percentage of unmatched treated units, both overall and by school:
# percentage of drops
> psm_w$ndrops / psm_w$orig.treated.nobs
1] 0.0703
[
# percentage of drops by school
> psm_w$orig.dropped.nobs.by.group / table(Group)
Group7472 7829 7930 24725 25456 25642 62821 68448 68493 72292
0.00 0.10 0.00 0.00 0.00 0.00 0.03 0.24 0.00 0.00
confirming that the percentage of drops is very low in all schools. We
could also similarly calculate the percentage of drops over treated
observations, which turn out to be high for school 64448
. The next
step before accepting a matching solution is the evaluation of the
achieved balance of confounders across the treatment and control groups.
To this end the package contains function CMatchBalance
that can be
applied to an object of class "CMatch
" to compare the balance before
and after matching (the matched dataset must be specified in the
match.out
argument):
> b_psm_w < CMatchBalance(T~ses + as.factor(sex) + white + public, data=schools,
match.out=psm_w)
***** (V1) ses *****
Before Matching After Matching0.23211 0.24655
mean treatment........ 0.36947 0.14807
mean control.......... 61.315 10.086
std mean diff.........
***** (V2) as.factor(sex)2 *****
Before Matching After Matching0.52344 0.52941
mean treatment........ 0.46212 0.56303
mean control.......... 12.229 6.706
std mean diff.........
***** (V3) white *****
Before Matching After Matching0.73438 0.7563
mean treatment........ 0.7197 0.71429
mean control.......... 3.3103 9.7458
std mean diff.........
***** (V4) public *****
Before Matching After Matching0.59375 0.57983
mean treatment........ 0.88636 0.57983
mean control.......... 59.346 0
std mean diff.........
(...)
The output reports the balance for each variable to allow the researcher to assess the effectiveness of matching in balancing each variable. Many balance metrics are provided but for simplicity of exposition we focus on comparing on the standardized mean difference between the two groups of students. Note that the asam can be easily obtained by averaging the standardized mean differences:
<vector()
vec for(i in 1:length()) {vec[[i]] < b_psm_w$AfterMatching[[i]]$sdiff}
> mean(abs(vec))
1] 6.634452 [
from which we can see that the initial asam of 34 (see
Table 3) sharply decreased after matching. Balance
improved dramatically for ses
and public
(results not shown). In
fact, for the latter it was possible to attain exact matching. This is
guaranteed by withincluster matching because it forces treated and
control students to belong to the same school. For the same reason,
withincluster matching also guarantees a perfect balance of all other
schoollevel variables (even unobservable) not included in the
propensity score estimation. The balance improved also for the sex
variable but not for the dummy white
(from 3.31% to 9.75%). In a real
study, the investigator may attain a matching solution with an even
better balance of the dummies for white and sex by changing the
propensity score model or one or more options in the matching
algorithms. For example, a smaller caliper could be tried. Note that
CMatchBalance
is a wrapper of the MatchBalance
function (Matching
package) so it measures balance globally and not by group. This is
acceptable also in a hierarchical context since we first and foremost
consider the overall balance. While a groupbygroup balance analysis
may be useful it is only the average asam
which matters when
estimating the ATT on the whole population of treated units.
We highlighted that the withincluster algorithm always guarantees a
perfect balance in all clusterlevel confounders as in the example
above. However, note that it was not possible to match some treated
observations and in part this can be due to the matching constrained to
happen only within clusters. The researcher can relax the constraint
using preferential withincluster matching. This algorithm can be
invoked using the option type="pwithin"
in the CMatch
function:
> psm_pw < CMatch(type="pwithin", Y=Y, Tr=T, X=eps, Group=Group)
> summary(psm_pw)
PSM  MAH  
Statistics  within  pwithin  within  pwithin  
Orig. number of obs. 
260  260  260  260  
Orig. number of treated obs. 
128  128  128  128  
Matched number of treated obs. 
119  128  84  128  
Number of drops 
9  0  44  0  
ASAM before 
34.05  34.05  34.05  34.05  
ASAM after 
6.63  8.13  0.47  6.31  
ATT 
5.24  5.18  4.25  4.34  
SE 
2.07  2.14  2.23  1.83 
\(\dagger\) PSM: propensity score matching; MAH: Mahalanobis matching.
A comparison of results between within and preferential within matching
is given in Table 3. The preferential withincluster
matching was able to match all treated students
("Matched number of obs
" = 128), i.e., the number of unmatched
treated students is 0 ("Number of drops
" = 0). In this example, the
9 treated students that did not find a matched control within the
caliper in the same school found a control match in another school.
Looking to the overall balance attained by matching preferentially
within, we can notice that preferential withincluster matching showed a
slightly higher asam than withinmatching. In fact there is no clear
"winner" between the two algorithms: the balance of the individual
level variables ses
and white
improves slightly with the
preferential withincluster matching while for sex
the withincluster
matching is considerably better (not shown). Importantly, using
preferential withincluster matching the absolute standardized mean
difference for the schoollevel variable public
is 3.2% This is not a
high value because most of the treated units actually found a match
within schools. However, this finding points to the fact that
preferential withincluster matching is not able to perfectly balance
cluster level variables as the withincluster approach.
Finally, having achieved a satisfactory balance with a very low number
of drops we can estimate the ATT on the matched dataset. When the
argument Y
is not NULL
, an estimate is automatically given otherwise
the output of the CMatch
function only gives information about the
matching. The estimated average effect of studying math for at least one
hour per week on students’ math score is 5.24 with a standard error of
2.07 when matching within schools (Table 3). It is worth
mentioning that the reported SE is model based and adjusts for
nonindependence of students within schools. From Table 3
we can see that the estimated ATT and SE for the preferentialwithin
school approach are very similar to those obtained with the
withincluster approach. We stress that the estimated ATT should be
considered carefully and only after checking the matching solution.
In conclusion, preferential withincluster matching is expected to improve the solution of the withincluster matching in terms of a reduced number of unmatched units. On the other hand, withincluster matching guarantees a perfect balance of schoollevel variables (both observed and unobserved) while preferential within does not. The researcher, choosing between the two algorithms, has to consider the tradeoff between having a perfect balance of cluster level variables (withincluster matching) or reducing the number of unmatched treated units (preferential withincluster matching). The researcher can also implement both approaches and compare the results as a sort of sensitivity analysis.
Instead of matching on the propensity score, the researcher may match directly on the covariates space. This strategy can be advantageous when the number of covariates is fairly low and it is expected to match exactly a large number of units on the original space. The syntax is very similar to the above for propensity score matching: the only difference is that the user indicates the covariate matrix instead of the propensity score in the \(X\) argument:
> mal_w < CMatch(type="within", Y=Y, Tr=T, X=cbind(ses, sex, white, public),
Group=Group)
When \(X\) is a matrix, the covariate distance between a treated and a control unit is measured by Mahalanobis metrics, i.e., essentially the multivariate Euclidean distance scaled by the standard deviation of each covariate, a metrics which warrants an homogeneous imbalance reduction property for (approximately) ellipsoidally distributed covariates (Rubin 1980). From Table 3, columns 34, we can see that the balance of covariates was indeed very good. Note that the estimated ATT using Mahalanobis matching is lower than the corresponding estimate obtained with propensity score matching. However, withincluster matching using the Mahalanobis distance has generated a large number of unmatched treated units (44). Therefore, in this case preferential withincluster matching could be used to avoid an high proportion of drops.
Other strategies for controlling unobserved cluster covariates in PSM have been suggested by (Arpino and Mealli 2011). The basic idea is to account for the clustering in the estimation of the propensity score by using random or fixedeffects models. This strategy can be combined with the matching strategies presented before in order to ‘doubly’ account for the clustering. This can be done easily with CMatching. As an example we consider estimating the propensity score with a logit model with dummies for schools and then matching preferentially withinschools using the estimated propensity score:
# estimate ps
> mod < glm(T ~ ses + parented + public + sex + race + urban
 1, family=binomial(link="logit"), data=schools)
schid < fitted(mod)
eps
# match within on eps
> dpsm < CMatch(type="pwithin", Y=math, Tr=T, X=eps, Group=NULL)
In concluding this section, we also mention some other matching strategies which can be implemented using CMatching and some programming effort. First, the utility of the algorithms naturally extends when there are more than two levels. In this case, it can be useful to match preferentially on increasingly general levels, for example by allowing individuals to be matched first between regions and then between countries. Another natural extension involves data where units have multiple membership at the same hierarchical level. In this case it is possible to combine match within (or preferentialwithin) across levels, for example by matching students both within schools and within living district (e.g. 1 out of 4 possible combinations).
In this paper we presented the package CMatching implementing matching algorithms for clustered data. The package allows users to implement two algorithms: i) withincluster matching and ii) preferential withincluster matching. The algorithms provide a modelbased estimate of the causal effect and its standard error adjusted for withincluster correlation among observations. In addition, a tailored summary method and a balance function are provided to analyze the output. We illustrated the case for within and preferential withincluster matching analyzing data on students enrolled in different schools for which it is reasonable to assume important confounding at the school level. Finally, since the analysis of clustered observational data is an active area of research, we are willing to improve on standard error calculations for matching estimators with clustered data if new theoretical results in the causal inference literature will become available.
CMatching, Matching, designmatch, optmatch, MatchIT, quickmatch, multiwayvcov
CausalInference, Econometrics, ExperimentalDesign, HighPerformanceComputing, Optimization
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Cannas & Arpino, "Matching with Clustered Data: the CMatching Package in R", The R Journal, 2019
BibTeX citation
@article{RJ2019018, author = {Cannas, Massimo and Arpino, Bruno}, title = {Matching with Clustered Data: the CMatching Package in R}, journal = {The R Journal}, year = {2019}, note = {https://rjournal.github.io/}, volume = {11}, issue = {1}, issn = {20734859}, pages = {721} }