APCI: An R and Stata Package for Visualizing and Analyzing Age-Period-Cohort Data

Social scientists have frequently attempted to assess the relative contribution of age, period, and cohort variables to the overall trend in an outcome. We develop an R package APCI (and Stata command apci) to implement the age-period-cohort-interaction (APC-I) model for estimating and testing age, period, and cohort patterns in various types of outcomes for pooled cross-sectional data and multi-cohort panel data. Package APCI also provides a set of functions for visualizing the data and modeling results. We demonstrate the usage of package APCI with empirical data from the Current Population Survey. We show that package APCI provides useful visualization and analytical tools for understanding age, period, and cohort trends in various types of outcomes.

Jiahui Xu (The Pennsylvania State University) , Liying Luo (The Pennsylvania State University)
2022-10-11

1 Introduction

Researchers across disciplines have long been interested in distinguishing the relative contribution of three time-related variables — namely, age (i.e., how old a person is at the time of data collection), time periods (e.g., the Great Recession 2007-2009 and the COVID-19 pandemic beginning in December 2019), and cohort membership (e.g., the baby boom cohort born in 1945-1964 and the Millennials born in 1981-1996) — to the overall trends in various outcomes (e.g., labor force participation, attitudes, and cognitive functioning) (Clogg 1982; Alwin and McCammon 2003; Pescosolido et al. 2021). Decomposing the overall trends into age, period, and cohort variations provides insight into the ways in which biological and social factors affect these outcomes (Hobcraft et al. 1982; Heckman and Robb 1985; Fosse and Winship 2019).

To quantify the relative contribution of age, period, and cohort, Luo and Hodges (2020b) have recently developed a model called the age-period-cohort-interaction (APC-I) model. The APC-I is qualitatively different from other age-period-cohort (APC) models in that it characterizes cohort effects as a structure of the age-by-period interaction terms to acknowledge the interdependence of age, period, and cohort effects, whereas prior methods attempt to recover the independent and additive effects of the three variables. The APC-I model has been used to understand the unique contribution of cohort membership in various outcomes including crime involvement, substance use, and cultural taste (Verdery et al. 2020; Lu and Luo 2020; Ma 2020). However, the authors of the APC-I model focused on the conceptual motivation of the method and offered relatively few technical details for implementing the method. Estimating and testing cohort effects in the APC-I model may be be challenging for interested readers.

We developed an R package APCI (Xu and Luo 2021) and a Stata command apci for implementing the APC-I model in empirical research using pooled cross-sectional data (e.g., the General Social Survey and the Current Population Survey) and importantly, extend the APC-I method for analyzing multi-cohort longitudinal or panel data (e.g., data from the Health and Retirement Study (HRS) and the National Longitudinal Study of Youth (NLSY)). The purpose of this paper is three folded. First, we describe the R functions in the APCI package and Stata command to estimate and test age, period, and cohort effects in the APC-I model. The core function can be used for analyzing pooled cross-sectional data and multi-cohort longitudinal data. Second, we introduce a set of visualization tools to help researchers motivate an APC analysis and interpret age, period, and cohort effects from the APC-I model. Third, we clarify several important issues about characterizing cohort effects as a set of age-by-period interaction terms. We pay particular attention to the implications of coding schemes and how to interpret the between-cohort average deviations and within-cohort life-course variations.

This paper is organized as follows. Following a description of traditional APC models and the identification problem, we introduce the APC-I model and the estimation and testing procedures. We explain how and why the age-by-period interaction terms can be used to characterize cohort effects with particular attention to the implications of coding schemes for estimating and testing interactions. Next, we describe the visualization tools and functions in the R package APCI. We then demonstrate how to use the package with the empirical example of men’s and women’s labor force participation from 1990 to 2018 in the United States using data from the Current Population Survey (CPS, Flood et al. 2021).

2 Methodology: the APC-I model

The APC identification problem

To formally estimate and infer the independent age, period, and cohort effects, Mason et al. (1973) specified an analysis of variance (ANOVA) model that they labeled the age-period-cohort (APC) accounting model:

\[\label{eq:1} g\left(E\left(Y_{i j}\right)\right)=\mu+\alpha_{i}+\beta_{j}+\gamma_{k} \tag{1}\] for age groups \(i=1, 2,\ldots, A\), periods \(j=1, 2,\ldots, P\), and cohorts \(k=1, 2,\ldots, (A+P-1)\), where \(\sum_{i=1}^{A} \alpha_{i}=\sum_{j=1}^{P}\beta_{j}=\sum_{k=1}^{A+P-1} \gamma_{k}=0\). \(E\left(Y_{i j}\right)\)denotes the expected value of the outcome \(Y\) for the \(i\)th age group in the \(j\)th time period; \(g\) is the “link function”; \(\alpha_i\) denotes the mean difference from the global mean \(\mu\) associated with the \(i\)th age category; \(\beta_j\) denotes the mean difference from \(\mu\) associated with the \(j\)th period; \(\gamma_k\) denotes the mean difference from \(\mu\) associated with membership in the \(k\)th cohort.

Unfortunately, the APC accounting model ((1)) is not identified even when a coding scheme (e.g., dummy coding where one group is set as the reference group or effect coding where the sum of the coefficients for each effect is set to 0) is applied. This is because age, period, and cohort are exactly linearly related (see Fienberg and Mason 1979; Luo et al. 2016 for detailed discussions; Fosse and Winship 2019). As a result, the design matrix of model ((1)) has rank one less than full, so an infinite number of solutions (i.e., estimates) for the parameters fit the data equally well. That is, the data cannot distinguish different estimation results, so an additional constraint — in addition to the usual reference group or sum-to-zero constraint — must be imposed in order to choose one set of estimates. Moreover, interpreting the results is difficult because the standard interpretation of regression coefficients — that is, the conditional effect of one variable after accounting for other covariates — cannot apply due to the lack of variation in the third variable (e.g., cohort) after considering the other two (e.g., age and period).

The theoretical root of the identification problem in traditional APC models is the problematic assumption that age, period, and cohort effects operate independently of each other. It implies that the identification challenge is inherent in any APC model that attempts to separate independent and additive effects of age, period, and cohort and thus cannot be solved by changing the model setup [e.g., using random effects for period and cohort as in (Yang and Land 2006); see Luo and Hodges (2020a) for a critique] or by variable manipulation [e.g., using unequal interval widths for age, period, and cohort groups as in (Robertson and Boyle 1986; Sarma et al. 2012); see Luo et al. (2016) for a detailed discussion]. The identification problem is well recognized, and its consequences have been discussed extensively (Kupper et al. 1983, 1985; Fienberg and Mason 1985; Luo 2013; Grotenhuis et al. 2016; Luo et al. 2016; Fosse and Winship 2019; O’Brien 2020; Luo and Hodges 2020a; Morgan and Lee 2021). In essence, internal information derived from the data cannot help because the problem is circular: researchers do the analysis to learn precisely the kind of information needed to justify any such constraint.

The APC-I model

Luo and Hodges (2020b) proposed a new APC model called the age-period-cohort-interaction (APC-I) model. The APC-I model is qualitatively different from all estimators developed under the traditional framework in that it explicitly specifies cohort effects as a structure of the age-by-period interactions. A life-course dynamics hypothesis that concerns about whether and how cohort effects may change as cohorts age thus corresponds to a specific structure of the age-by-period interactions. This specification is motivated by the theoretical account that “The minimal basis for expecting interdependence between inter-cohort differentiation and social change is that change has variant import for persons of unlike age” (Ryder 1965). That is, a basic notion on which cohort analysis rests is that “transformations of the social world modify people of different ages in different ways.” (Ryder 1965)

The APC-I model is fully identified in the sense that it does not require additional constraints other than a regular coding scheme. It is also flexible enough to test various hypotheses about life-course changes within cohorts as cohort members age. We first describe the model specification and estimation and testing techniques. The next section demonstrates the procedure using empirical examples.

The general form of the APC-I model can be written as: \[\label{eq:2} g\left(E\left(Y_{i j}\right)\right)=\mu+\alpha_{i}+\beta_{j}+\alpha \beta_{i j(k)} \tag{2}\] where \(g\), \(Y_{ij}\), \(\mu\), \(\alpha_{i}\) and \(\beta_{j}\) are defined as in model ((1)) and \(\alpha \beta_{i j(k)}\) denotes the interaction of the \(i\)th age group and \(j\)th period group, corresponding to the effect of the \(k\)th cohort. Note that except for the oldest and youngest cohorts, the effect of one cohort includes multiple age-by-period interaction terms \(\alpha \beta_{i j(k)}\) that lie on the same diagonal in a table with ages in rows and periods in columns.

Model ((2)) differs from model ((1)) in the way that cohort effects are modeled. In model ((2)), cohort effects are considered as a specific form of the age-by-period interaction. In statistics, the interaction between two variables describes the differential effects of one variable depending on the level of the other (Scheffé 1999). In APC research, this means that if part of the overall pattern of interest can be attributed to cohort differences, significant age-by-period interactions should be present. When cohort membership is not associated with the outcome — that is, when the effects of historical or social shifts (period effects) are uniform across age groups — then age-by-period interactions should not be present.

Luo and Hodges (2020b) described a procedure for investigating age and period main effects and inter-cohort deviations and intra-cohort dynamics. They recommended beginning with a deviance test about whether the effects of time periods vary among age groups, which is called “a global F test”. A non-significant global F statistic indicates that there are few age-by-period interaction effects and thus little cohort variation. If the model suggests significant age-by-period interaction effects, one may proceed to examine inter- and intra-cohort differences1. Inter-cohort average deviations are quantified based on the arithmetic mean of the age-by-period interaction terms contained in each cohort and a t test can be used to examine the average of that cohort-specific deviation. To investigate intra-cohort dynamics over the life course (e.g., the cumulative (dis)advantage hypothesis in Dannefer 1987; Chauvel et al. 2016; Ferraro and Morton 2018; O’Brien 2020), one may use a t-test of the linear orthogonal polynomial contrast in each cohort’s age-by-period interaction effects. This intra-cohort life-course dynamics test is helpful for investigating whether the average (dis)advantages of the members of that cohort accumulate, remain stable, or diminish as they age.

The APC-I model has three advantages. First, it is identified in that it does not require additional constraints other than the usual coding scheme. That is, it avoids the identification problem of the APC accounting model based on the theoretical account of cohort effects and allows inclusion of other important predictors such as education, sex, and employment status. Second, the interpretation of the coefficient estimates of the APC-I model is meaningful and straightforward. This is because the APC-I model recognizes the dependence of age, period, and cohort so the dilemma that analysts face using traditional APC models does not apply. Third, the APC-I model permits investigating life-course dynamics as a cohort ages, whereas extant methods usually assume that cohort effects do not change.

It is important to note that the APC-I model is never intended to "solve" the identification problem in traditional APC accounting models because it is a false problem to begin with. Given the near monopoly of the accounting model, it may be challenging not to see the APC-I method through the lens of the traditional APC accounting framework. For example, because the APC-I model quantifies cohort effects as a structure of the age-by-period interactions, some readers may take it to mean that the APC-I model cannot estimate "linear cohort main effects". However, the APC-I method, by design, does not intend to estimate any kind of "linear cohort main effects" precisely because the traditional model’s assumption that there is a linear cohort effect that is additive or independent of age and period effects lacks theoretical grounding and is thus arbitrary and questionable. Please see Luo and Hodges (2020b) for a more thorough discussion about the theoretical motivation of the APC-I model.

Because the APC-I mode is relatively new, below we make additional remarks about interaction effects and coding schemes to help readers better understand and use the model.

Interaction effects

In some cases, interaction terms may be difficult to interpret besides suggesting that the effect of one variable may depend on the values of the other. However, as explained by Luo and Hodges (2020b), the age-by-period interaction terms correspond to the conceptual definition of cohort effects and thus can be modeled and interpreted in a meaningful way. Specifically, cohort effects are expected when the influence of social events and changes differ by age groups. This conceptualization of cohort effects implies that the age-by-period interactions, which represent the differential effects of time periods depending on age, can be used to measure cohort effects.

Technically, because of the linear dependency among age, period, and cohort, the effects of the third variable can be expressed as the interaction between the other two variables. The APC-I model considers age and period as main effects and cohort their interactions, which may give the impression that it privileges age and period effects and “discriminates” against cohort effects. The theoretical motivation for this choice is that it is often desirable to estimate a general age pattern that individuals follow as they get older. Period main effects are used to represent the kind of impacts of social environment that everyone in the society is exposed to. The decision to quantify cohort effects as a specific form of age-by-period interaction is informed by the demographic literature on how cohort effects are conceptualized in relation to age and period effects. Empirically, as the analysis of women’s labor force participation in section Examples will illustrate, the size of the cohort effects, characterized as the age-by-period interactions, is not necessarily smaller—in fact may be larger—than some of the main effects.

Coding scheme and contrast

For the unidentified APC accounting model, some estimation methods including the intrinsic estimator yield effect estimates that are dependent on the choice of coding schemes in that estimates under different coding schemes are not equivalent (Grotenhuis et al. 2016 for a more detailed discussion; see Luo et al. 2016). The APC-I model does not have a rank deficiency problem in the sense that it does not require more constraints than a usual ANOVA model with main effects and their interactions. For any identified model including the APC-I model, the estimates are equivalent; that is, the estimated cell means are the same for all coding schemes.

Although this equivalence holds for both main effects and interaction estimates in the APC-I model, it is less obvious for interaction terms because the interpretation of the specific parameter estimates do change with coding schemes. To illustrate, consider an example of applying the APC-I model to health data with three age categories and three periods, shown in 1 below. Under dummy coding—for example, the youngest age group 20-24 and the beginning survey period of 2000 are set to zero or omitted as the referent—the interaction for ages 25-29 and period 2005 in cell Y represents the difference in a health outcome between periods 2000 and 2005 for age 25-29 or equivalently, health difference between ages 25-29 and 20-24 for the period 2005. That is, interactions under dummy coding represents a directional difference from a particular reference group.

Table 1: Hypothetical data with three age categories and three periods illustrating a shift in meaning and interpretation of interaction terms under different coding schemes. The interaction terms under different types of coding in cell Y necessarily have different numerical values because of different reference groups. For example, the interaction term in cell Y under dummy coding represents a directional difference from a particular reference group (e.g., age 20-24 in year 2000). Under effect coding, the same interaction term in cell Y represents the deviation in the outcome from the age main effect plus period main effect for the group of individuals who were age 25-29 and surveyed in period 2005. Such different numeric values do not arise from an identification problem but rather from a shift in what these quantities represent.
2000 2005 2010
20-24
25-29 \(\mathrm{Y}\)
30-34

By contrast, under effect coding (i.e., sum-to-zero coding), the same interaction term in cell \(Y\) represents the deviation in the health outcome from the age main effect plus period main effect for the group of individuals who were age 25-29 and surveyed in period 2005.

The estimated interaction terms under these two types of coding in cell \(Y\) thus necessarily have different numerical values. However, this difference does not arise from an identification problem but rather from a shift in what these quantities represent. That is, the two interaction terms can be transformed to be equivalent so that the means in \(Y\) after considering age and period main effects are the same under the two coding schemes.

We recommend using effect or sum-to-zero coding for estimating the APC-I model for the following reasons: when characterizing cohort effects as a set of age-by-period interactions, we are less concerned about any direction of the interactions; that is, we are not particularly interested in the difference between two cohorts at a particular age or time period. Rather, we focus on particular structures of these interactions that may represent theoretically interesting patterns during a cohort’s life span. Effect coding is helpful because they all have the same referent group — the next lower level in the hierarchy of main effects and interactions. That is, we choose effect coding for the purpose of easy interpretation. This is also consistent with the recommendation of coding schemes in the presence of interactions (Aiken et al. 1991; Jaccard and Turrisi 2003).

3 The R package APCI

Package installation

The R package APCI 2 can be installed and loaded using the following R code3:

# install R package APCI
> install.packages("APCI")
# load R package APCI to the current working environment 
> library(APCI)

The main routines to implement the APC-I model using package APCI are apci.plot.raw, apci, apci.plot (or apci.plot.hexagram, apci.plot.heatmap). A summary of these functions and input arguments used in the routines are described below.

Functions in R package APCI

The R package APCI contains the following functions for estimating the APC-I model and visualizing the data and the model results:

A summary of input arguments required in these functions will be given one by one 4 in the next section. Package APCI also contains three empirical datasets women9017, cpsmen, cpswomen, and one simulated dataset simulation. Dataset women9017 was used and described in Luo and Hodges (2020b). Applications of the APC-I model to the other two empirical datasets are given in section Examples.

Function apci

Function apci is the core function in the R package APCI. It fits an APC-I model with or without covariates and returns a list of results including coefficients and standard error estimates for age main effects, period main effects, inter-cohort average deviations, and intra-cohort life-course trends, and covariate coefficients if any. Both pooled cross-sectional data and multi-cohort longitudinal/panel data are supported. Specifically, function apci is used as

    apci(data, outcome, age, period, cohort, weight, covariate, family, 
        dev.test=TRUE, print, gee, id, corstr,...)

and takes the following arguments:

As mentioned in section Coding scheme and contrast, we use effect coding to estimate the APC-I model. The age and period arguments in function apci accept categorical variables. Different from the common approach of dummy coding or simple coding, where an effect is defined as the difference of each group from the reference group, function apci uses effect coding (i.e., the sum-to-zero coding, deviation coding, or the ANOVA coding) as the default coding. Under this coding scheme, the effect of the omitted category equals the negative sum of the effects of all other categories. Computation wise, the effect coefficient of the omitted category is redundant because of the coding scheme. However, for the purpose of quantifying cohort effects as deviations from the main effects of age and period, it is useful to compute estimates for all age-by-period cross-classifications and their standard errors. For data with A age groups and P periods, therefore, function apci returns \(A\) and \(P\) number of main effect estimates and \(A*P\) number of interaction estimates along with their standard error estimates. For the main effects of age or period, the estimate can be interpreted as the deviation associated with each age or period group from the global mean. The age-by-period interactions represent the deviation from the expectation determined by the corresponding age and period main effects.

Also note that when age and period groups have unequal interval widths in an age-period classification table, the age-by-period interactions contained in a cohort no longer lie on the same diagonals. Because cohort effects are conceptualized and estimated as a structure of the age-by-period interactions in the APC-I model, it is technically possible to use the argument unequal_interval in package APCI to extract interaction coefficient estimates that lie on different diagonals. However, unequal age and period group intervals may complicate the issue of cohort overlapping noted by Kupper et al. (1985). For this reason, we do not recommend using unequal interval widths for age and period groups if possible.

After fitting the APC-I model, function apci will store the following components as a list for further usage:

Function tests

As mentioned earlier, the first step of implementing the APC-I model is to conduct a global F test of the age-by-period interactions. This step is a routine in function apci, but the procedure does not stop even if there is no statistically significant deviation from the age main effect and period main effect. Therefore, we recommend separately conducting the global F test. In R package APCI, the function tests can be used for this purpose. It can be used as follows:

    tests(model, A, P, C, ...)

and takes the following arguments:

Function tests will return a standard F test result including the value of the F test statistic and the associated p-value.

Functions for visualization

In package APCI, we provide four functions to facilitate visualizing the data and model results, namely apci.plot.raw, apci.heatmap, apci.plot.hexagram, and apci.plot, in different stages of a research project. They take similar input arguments. A summary of these arguments is given below.

Function apci.plot.raw is designed to plot the outcome variable aggregated by age or period groups. This function may be used in the stage of data exploration. Functions apci.heatmap and apci.plot.hexagram are designed to plot the age-by-period interactions from the APC-I model. Both functions generate heatmaps, where one axis represents age groups, and the other period groups. The cells in a diagonal represent one cohort. The difference between the two functions is the layout of the heatmap; one is a rectangular graph and the other a hexagram. Function apci.plot can be used to visualize both raw data and model results. It divides the canvas into four (\(2 \times 2\)) panels. Three of the four panels can be used to visualize the three effect estimates in the APC-I model and the left panel to add notes. For data exploration, users can visualize the mean values of the outcome variable across age, period, and cohort groups on the same canvas. Users can the same functions to visualize the estimated age, period, and cohort effects from the modeling results.

The visualization functions in package APCI include:

    apci.plot.raw(data, outcome_var, age, period, ...)

    apci.plot.heatmap(model, age, period, color_map = NULL, color_scale = NULL, 
        quantile = NULL, ...)

    apci.plot.hexagram(model, age, period, first_age, first_period, interval, 
        color_scale = NULL, color_map = NULL, quantile = NULL, …)

    apci.plot(model, age, period, outcome_var, type = "model", quantile = NULL, 
        ...)

and takes the following arguments:

4 Examples: Application of the R package APCI to empirical data

We now illustrate how to use package APCI’s visualization and analytical functions. We describe and analyze two empirical datasets to demonstrate how this package may be used to analyze pooled cross-sectional data. We later briefly describe how to fit an APC-I model for multi-cohort longitudinal data.

Cross-sectional data of labor force participation in the United States

Temporal trends in men’s and women’s labor force participation (LFP) in United States have gathered much scholarly attention (Farkas 1977; Treas 1987; Wilkie 1991; see e.g. Connelly 1992; Macunovich 2012; Hollister and Smith 2014). For example, whereas men’s LFP steadily declined in the past decades (Wilkie 1991), women’s LFP continued to rise until the 1990s and the 2000s. Female LFP has since then reached a plateau and even begun to decline. Researchers have debated about the causes of this leveling off or decline. Some studies attributed the observed trends to period-specific factors such as labor demand (Erceg and Levin 2014), the economic shocks of the Great Recession (Boushey 2005; Hoffman 2009), social welfare and disability insurance (Duggan and Imberman 2009), and gender role attitudes (Fortin 2015).

However, the temporal trends in LFP are unlikely due to a pure period process. For example, because individuals may begin to leave the labor force in age 50, a decline in LFP should be expected if the proportion of the population age 50 or older has increased. That is, the recent trends may reflect a change in the age composition of the US population (Aaronson et al. 2014). The cohort succession may also contribute to the trend, a process in which older cohorts with higher or lower LFP rates begin to decease and younger cohorts with lower or higher LFP enter the labor force Lee (2014). At the same time, critical social and demographic changes in education level, fertility, and attitudes about women working outside the home may be more of cohort-specific than period-specific processes because these forces only affect individuals of certain ages (Goldin 2006; Farré and Vella 2013; Fernández 2013; Balleer et al. 2014).

Given that the observed trends in LFP are likely a mix of age, period, and/or cohort patterns, an APC analysis that decomposes the temporal trends into age-, period-, and cohort-related variations is thus helpful for revealing the demographic, social, and economic changes that have underlined the temporal trends in American’s labor force participation. In this following section, we demonstrate how to use the functions in APCI to undertake an APC analysis of men’s and women’s LFP using a cross-sectional dataset.

The Current Population Survey (CPS, Flood et al. 2021) is the primary source of labor force statistics in the United States (Flood et al. 2021). Beginning in the 1960s, the CPS has been collecting data on key demographic, economic, and education topics. We subset the 1990-2019 CPS data by gender, resulting in two datasets, namely cpsmen and cpswomen, to show how to conduct an APC-I analysis of men’s and women’s LFP in the United States using package APCI.

Datasets cpsmen and cpswomen contain a subset of men and women age 20-64 who participated in the 1990 to 2019 CPS. The following code is used to load the data into the working environment:

> data(cpsmen)
> data(cpswomen)

The first five rows of the datasets of cpsmen and cpswomen are:

> head(cpsmen, n = 5)
  asecwt year age labforce educc
 2854.84    3   5        0     1
 1576.54    4   4        1     2
 2340.55    2   6        1     3
  158.44    3   5        0     0
  347.09    6   6        1     3

> head(cpswomen, n = 5)
  asecwt year age labforce educc
 2415.67    2   3        1     1
  663.89    2   5        1     3
 1653.01    6   4        1     2
 1613.31    6   4        0     2
  177.23    4   3        1     2

where labforce indicates the respondent’s labor force participation status (1=in the labor force, 0=not in the labor force). asecwt is the person-level weight that the CPS recommends to be used in individual-level data analyses. year indicates the survey year when respondent was interviewed, grouped into 6 period groups (1=1990-94, 2=1995-99, …, 6=2015-19). age indicates the respondent’s age categories (1=20-24, 2=25-29, …, 9=60-64). educc is a three-level categorical education measure (1=less than high school, 2=high school graduate, 3=college degree or above).

For data exploratory purpose, function apci.plot.raw visualizes the outcome variable in the following way:

> apci.plot.raw(data, outcome_var, age, period)
graphic without alt text
Figure 1: Period-specific age trajectories and age-specific period trends in labor force participation rates for men and women in the United States, the Current Population Survey 1990-2019. Age trajectories are similar across time periods for both men and women (top panel). Period trends differ by age groups, especially among women (bottom panel).

Figure 1 shows LFP rates by age groups (top panel) and period groups (bottom panel), respectively, for male (left panel) and female CPS respondents (right panel) age 20 to 64 from 1990 to 2019. Figure 1’s top panel suggests similar age patterns in LFP across time periods. The bottom panel shows distinct period trends depending on age groups. For example, the LFP rates among women in the 55-59 and 60-64 age groups seem to have gone up whereas other age groups show a relatively flat trend. Such distinct period patterns in LFP by age groups suggest potential cohort variations in women’s labor force participation. For men’s LFP, however, the visualization results suggest that a simpler model with age and period main effects may suffice for summarizing their LFP patterns.

Function apci can be used to fit an APC-I model for pooled cross-sectional data or multi-cohort longitudinal/panel data. In the simplest form of an APC-I model without covariates for pooled cross-sectional data, function apci is called as follows:

> no_cov <- APCI::apci(outcome = "labforce", 
+                      age = "age", 
+                      period = "year", 
+                      weight = "asecwt",
+                      data = cpswomen, 
+                      dev.test = FALSE, 
+                      family = "binomial")

It is often desirable to add covariates in the model, which can be done by calling the covariate argument. For example, suppose one would like to add education levels (“educc”) as a covariate in the model, function apci can be used as:

> with_cov <- APCI::apci(outcome = "labforce",
+                        age = "age",
+                        period = "year",
+                        covariate = c("educc"),
+                        weight = "asecwt",
+                        data = cpswomen, 
+                        print=F,
+                        dev.test=FALSE, 
+                        family = "binomial")

Below is a summary of the results from an APC-I model that includes education levels as a covariate:

> summary(with_cov)
               Length Class      Mode     
model          33     svyglm     list     
dev_global      0     -none-     NULL     
intercept       4     -none-     character
age_effect     45     -none-     character
period_effect  30     -none-     character
cohort_average  6     data.frame list     
cohort_slope    6     data.frame list     
int_matrix      5     data.frame list     
cohort_index   54     -none-     numeric  
data            7     data.frame list   

The returned value is a list of objects. model contains the results from a logistic regression model with age and period main effects and the unstructured interactions. dev_global displays the global F test result. A significant F statistic suggests that there may exist cohort effects. intercept is the overall intercept (\(\mu\) in Equation (2)). age_effect gives estimated age main effect. period_effect is the estimated period main effect. cohort_average gives inter-cohort average deviations from age and period main effects. cohort_slope gives intra-cohort life-course linear slopes, which can be used for testing intra-cohort life-course dynamics. int_matrix displays a matrix that contains the estimated coefficients for age-by-period interactions. Note that there are A*P interactions in int_matrix because effect coding is used to compute the A+P-1 interaction estimates based on the (A-1)*(P-1) freely varying interaction parameters. Such interaction estimates are used to generate heatmaps similar to Figure 2. data stores the data fed into apci function. Users may call an object to obtain detailed results. For example, by calling with_cov$cohort_average and with_cov$cohort_slope, users can obtain estimated inter-cohort average deviations and intra-cohort life-course slopes.

The output below shows education-adjusted inter-cohort average deviations in women’s LFP from analyzing the cpswomen data using function apci.

> with_cov$cohort_average
   c_avg_group c_avg_est c_avg_se c_avg_t c_avg_p c_avg_sig
1            1    -0.329    0.193  -1.709   0.088          
2            2    -0.155    0.142  -1.091   0.275          
3            3    -0.162    0.114  -1.422   0.155          
4            4     0.047    0.097   0.481   0.631          
5            5     0.096    0.085   1.139   0.255          
6            6     0.174    0.076   2.288   0.022       *  
7            7     0.034    0.074   0.457   0.648          
8            8    -0.036    0.074  -0.493   0.622          
9            9     0.003    0.073   0.047   0.963          
10          10    -0.072    0.081  -0.894   0.371          
11          11     0.030    0.085   0.353   0.724          
12          12    -0.029    0.102  -0.288   0.774          
13          13    -0.080    0.131  -0.609   0.543          
14          14    -0.103    0.170  -0.608   0.543          

where c_avg_group indicates cohort membership (e.g., cohort 1=the 1930 birth cohort, cohort 2=the 1935 birth cohort,...,cohort 14=the 1995 birth cohort), c_avg_est is inter-cohort average deviation, c_avg_se is the standard error estimate for the average deviation, c_avg_t is the t test statistic for the average deviation, and c_avg_p and c_avg_sig are the p values and alpha levels (*: p < .05, **: p < .01, and ***: p < .001), respectively.

The results from with_cov$cohort_average imply that on average, the LFP rates among cohort 6’s–the 1955 birth cohort – significantly differ from the expected rates based on age and period main effects. Specifically, the 1955 cohort shows a .19 (exp(.174)-1, p < .05) higher participation rate than the expectation based on the age and period main effects.

The output below shows education-adjusted intra-cohort life-course dynamics in women’s LFP from analyzing the cpswomen data using function apci.

> with_cov$cohort_slope
   c_slp_group c_slp_est c_slp_se c_slp_t c_slp_p c_slp_sig
1            1        NA       NA      NA      NA      <NA>
2            2     0.165    0.195   0.849   0.396          
3            3    -0.215    0.187  -1.148   0.251          
4            4     0.163    0.189   0.866   0.386          
5            5     0.093    0.184   0.508   0.611          
6            6     0.007    0.169   0.039   0.969          
7            7     0.047    0.172   0.277   0.782          
8            8    -0.096    0.181  -0.530   0.596          
9            9    -0.187    0.173  -1.076   0.282          
10          10    -0.106    0.176  -0.602   0.547          
11          11    -0.279    0.159  -1.750   0.080          
12          12     0.353    0.160   2.207   0.027       *  
13          13    -0.047    0.180  -0.262   0.793          
14          14        NA       NA      NA      NA      <NA>

where c_slp_group indicates cohort membership (e.g., cohort 1=the 1930 birth cohort, cohort 2=the 1935 birth cohort,..., cohort 14=the 1995 birth cohort), c_slp_est is intra-cohort life-course slopes, c_slp_se is the standard error estimate for the life-course slope, c_slp_t is the t test statistic for the life-course slope, and c_slp_p and c_slp_sig are the p values and alpha levels (*: p < .05, **: p < .01, and ***: p < .001), respectively. NAs are generated for the youngest and oldest cohort because there is only one age-by-period combination observed for the two cohorts and thus intra-cohort life-course dynamics cannot be accessed.

For example, for cohort 12 (the 1985 birth cohort), the estimated intra-cohort slope is 0.353 (p < .05), meaning that this cohort’s LFP is lower than expected when they were young but higher than expected in older ages. Interestingly, for cohort 12 (the 1985 birth cohort), their average cohort deviation is not statistically significant. Such an insignificant inter-cohort average deviation and a significantly negative intra-cohort slope indicate a compensation life-course pattern; that is, this cohort’s lower-than-expected LFP in younger ages seems to be compensated by their higher LFP when they were older.

The intra-cohort life-course dynamics are based on the age-by-period interactions as follows:

# the first six rows of the life-course dynamics
> with_cov$int_matrix 
  iaesti  iase   iap iasig cohortindex
1  0.166 0.169 0.327                 9
2 -0.048 0.207 0.818                 8
3  0.068 0.164 0.678                 7
4  0.095 0.172 0.581                 6
5  0.205 0.191 0.283                 5
6  0.227 0.193 0.239                 4
# [there are 48 rows compressed]

where “iaesti” is the age-by-period interaction estimates, “iase” is the standard error estimate for the interaction term, “iap” and “iasig” are the p value and alpha level (*: p < .05, **: p < .01, and ***: p < .001),respectively, and “cohortindex” indicates cohort membership.

The following code can be used to organize the intra-cohort life-course estimates in a matrix form:

> matrix(with_cov$int_matrix, A, P)[A:1,] 
# A is the number of age groups and P is the number of period groups.

       period #1 period #2 period #3 period #4 period #5 period #6
age #9 -0.329    -0.038    -0.416*    0.334     0.267     0.182   
age #8 -0.272     0.043     0.017     0.125     0.001     0.086   
age #7 -0.112    -0.392*   -0.070     0.257     0.278     0.040   
age #6  0.227    -0.046     0.443*   -0.333*   -0.258    -0.033   
age #5  0.205     0.066     0.061    -0.150    -0.074    -0.107   
age #4  0.095     0.044     0.139     0.065    -0.109    -0.234   
age #3  0.068     0.059    -0.359*   -0.147     0.096     0.283   
age #2 -0.048     0.256    -0.006     0.067    -0.156    -0.113   
age #1  0.166     0.009     0.191    -0.216    -0.046    -0.103 

Based on the R package ggplot2 (Wickham 2016), heatmaps can be generated to visualize inter- and intra-cohort patterns and motivate a subsequent formal APC analysis. For example, for dataset whitemen, both inter-cohort average deviations and intra-cohort life-course dynamics may be visualized in a heatmap as follows:

> apci.plot.heatmap(model = with_cov, age = "age",period = 'year',
                  color_map = c('blue','yellow'))