Social scientists have frequently attempted to assess the relative contribution of age, period, and cohort variables to the overall trend in an outcome. We develop an R package *APCI* (and Stata command `apci`

) to implement the age-period-cohort-interaction (APC-I) model for estimating and testing age, period, and cohort patterns in various types of outcomes for pooled cross-sectional data and multi-cohort panel data. Package *APCI* also provides a set of functions for visualizing the data and modeling results. We demonstrate the usage of package *APCI* with empirical data from the Current Population Survey. We show that package *APCI* provides useful visualization and analytical tools for understanding age, period, and cohort trends in various types of outcomes.

Researchers across disciplines have long been interested in distinguishing the relative contribution of three time-related variables — namely, age (i.e., how old a person is at the time of data collection), time periods (e.g., the Great Recession 2007-2009 and the COVID-19 pandemic beginning in December 2019), and cohort membership (e.g., the baby boom cohort born in 1945-1964 and the Millennials born in 1981-1996) — to the overall trends in various outcomes (e.g., labor force participation, attitudes, and cognitive functioning) (Clogg 1982; Alwin and McCammon 2003; Pescosolido et al. 2021). Decomposing the overall trends into age, period, and cohort variations provides insight into the ways in which biological and social factors affect these outcomes (Hobcraft et al. 1982; Heckman and Robb 1985; Fosse and Winship 2019).

To quantify the relative contribution of age, period, and cohort, Luo and Hodges (2020b) have recently developed a model called the age-period-cohort-interaction (APC-I) model. The APC-I is qualitatively different from other age-period-cohort (APC) models in that it characterizes cohort effects as a structure of the age-by-period interaction terms to acknowledge the interdependence of age, period, and cohort effects, whereas prior methods attempt to recover the independent and additive effects of the three variables. The APC-I model has been used to understand the unique contribution of cohort membership in various outcomes including crime involvement, substance use, and cultural taste (Verdery et al. 2020; Lu and Luo 2020; Ma 2020). However, the authors of the APC-I model focused on the conceptual motivation of the method and offered relatively few technical details for implementing the method. Estimating and testing cohort effects in the APC-I model may be be challenging for interested readers.

We developed an R package
*APCI* (Xu and Luo 2021) and a Stata
command `apci`

for implementing the APC-I model in empirical research
using pooled cross-sectional data (e.g., the General Social Survey and
the Current Population Survey) and importantly, extend the APC-I method
for analyzing multi-cohort longitudinal or panel data (e.g., data from
the Health and Retirement Study
(HRS) and the National Longitudinal
Study of Youth (NLSY)). The purpose of this
paper is three folded. First, we describe the R functions in the
*APCI* package and Stata
command to estimate and test age, period, and cohort effects in the
APC-I model. The core function can be used for analyzing pooled
cross-sectional data and multi-cohort longitudinal data. Second, we
introduce a set of visualization tools to help researchers motivate an
APC analysis and interpret age, period, and cohort effects from the
APC-I model. Third, we clarify several important issues about
characterizing cohort effects as a set of age-by-period interaction
terms. We pay particular attention to the implications of coding schemes
and how to interpret the between-cohort average deviations and
within-cohort life-course variations.

This paper is organized as follows. Following a description of
traditional APC models and the identification problem, we introduce the
APC-I model and the estimation and testing procedures. We explain how
and why the age-by-period interaction terms can be used to characterize
cohort effects with particular attention to the implications of coding
schemes for estimating and testing interactions. Next, we describe the
visualization tools and functions in the R package
*APCI*. We then demonstrate
how to use the package with the empirical example of men’s and women’s
labor force participation from 1990 to 2018 in the United States using
data from the Current Population Survey
(CPS, Flood et al. 2021).

To formally estimate and infer the independent age, period, and cohort effects, Mason et al. (1973) specified an analysis of variance (ANOVA) model that they labeled the age-period-cohort (APC) accounting model:

\[\label{eq:1} g\left(E\left(Y_{i j}\right)\right)=\mu+\alpha_{i}+\beta_{j}+\gamma_{k} \tag{1}\] for age groups \(i=1, 2,\ldots, A\), periods \(j=1, 2,\ldots, P\), and cohorts \(k=1, 2,\ldots, (A+P-1)\), where \(\sum_{i=1}^{A} \alpha_{i}=\sum_{j=1}^{P}\beta_{j}=\sum_{k=1}^{A+P-1} \gamma_{k}=0\). \(E\left(Y_{i j}\right)\)denotes the expected value of the outcome \(Y\) for the \(i\)th age group in the \(j\)th time period; \(g\) is the “link function”; \(\alpha_i\) denotes the mean difference from the global mean \(\mu\) associated with the \(i\)th age category; \(\beta_j\) denotes the mean difference from \(\mu\) associated with the \(j\)th period; \(\gamma_k\) denotes the mean difference from \(\mu\) associated with membership in the \(k\)th cohort.

Unfortunately, the APC accounting model ((1)) is not identified even when a coding scheme (e.g., dummy coding where one group is set as the reference group or effect coding where the sum of the coefficients for each effect is set to 0) is applied. This is because age, period, and cohort are exactly linearly related (see Fienberg and Mason 1979; Luo et al. 2016 for detailed discussions; Fosse and Winship 2019). As a result, the design matrix of model ((1)) has rank one less than full, so an infinite number of solutions (i.e., estimates) for the parameters fit the data equally well. That is, the data cannot distinguish different estimation results, so an additional constraint — in addition to the usual reference group or sum-to-zero constraint — must be imposed in order to choose one set of estimates. Moreover, interpreting the results is difficult because the standard interpretation of regression coefficients — that is, the conditional effect of one variable after accounting for other covariates — cannot apply due to the lack of variation in the third variable (e.g., cohort) after considering the other two (e.g., age and period).

The theoretical root of the identification problem in traditional APC models is the problematic assumption that age, period, and cohort effects operate independently of each other. It implies that the identification challenge is inherent in any APC model that attempts to separate independent and additive effects of age, period, and cohort and thus cannot be solved by changing the model setup [e.g., using random effects for period and cohort as in (Yang and Land 2006); see Luo and Hodges (2020a) for a critique] or by variable manipulation [e.g., using unequal interval widths for age, period, and cohort groups as in (Robertson and Boyle 1986; Sarma et al. 2012); see Luo et al. (2016) for a detailed discussion]. The identification problem is well recognized, and its consequences have been discussed extensively (Kupper et al. 1983, 1985; Fienberg and Mason 1985; Luo 2013; Grotenhuis et al. 2016; Luo et al. 2016; Fosse and Winship 2019; O’Brien 2020; Luo and Hodges 2020a; Morgan and Lee 2021). In essence, internal information derived from the data cannot help because the problem is circular: researchers do the analysis to learn precisely the kind of information needed to justify any such constraint.

Luo and Hodges (2020b) proposed a new APC model called the age-period-cohort-interaction (APC-I) model. The APC-I model is qualitatively different from all estimators developed under the traditional framework in that it explicitly specifies cohort effects as a structure of the age-by-period interactions. A life-course dynamics hypothesis that concerns about whether and how cohort effects may change as cohorts age thus corresponds to a specific structure of the age-by-period interactions. This specification is motivated by the theoretical account that “The minimal basis for expecting interdependence between inter-cohort differentiation and social change is that change has variant import for persons of unlike age” (Ryder 1965). That is, a basic notion on which cohort analysis rests is that “transformations of the social world modify people of different ages in different ways.” (Ryder 1965)

The APC-I model is fully identified in the sense that it does not require additional constraints other than a regular coding scheme. It is also flexible enough to test various hypotheses about life-course changes within cohorts as cohort members age. We first describe the model specification and estimation and testing techniques. The next section demonstrates the procedure using empirical examples.

The general form of the APC-I model can be written as: \[\label{eq:2} g\left(E\left(Y_{i j}\right)\right)=\mu+\alpha_{i}+\beta_{j}+\alpha \beta_{i j(k)} \tag{2}\] where \(g\), \(Y_{ij}\), \(\mu\), \(\alpha_{i}\) and \(\beta_{j}\) are defined as in model ((1)) and \(\alpha \beta_{i j(k)}\) denotes the interaction of the \(i\)th age group and \(j\)th period group, corresponding to the effect of the \(k\)th cohort. Note that except for the oldest and youngest cohorts, the effect of one cohort includes multiple age-by-period interaction terms \(\alpha \beta_{i j(k)}\) that lie on the same diagonal in a table with ages in rows and periods in columns.

Model ((2)) differs from model ((1)) in the way that cohort effects are modeled. In model ((2)), cohort effects are considered as a specific form of the age-by-period interaction. In statistics, the interaction between two variables describes the differential effects of one variable depending on the level of the other (Scheffé 1999). In APC research, this means that if part of the overall pattern of interest can be attributed to cohort differences, significant age-by-period interactions should be present. When cohort membership is not associated with the outcome — that is, when the effects of historical or social shifts (period effects) are uniform across age groups — then age-by-period interactions should not be present.

Luo and Hodges (2020b) described a procedure for
investigating age and period main effects and inter-cohort deviations
and intra-cohort dynamics. They recommended beginning with a deviance
test about whether the effects of time periods vary among age groups,
which is called “a global F test”. A non-significant global F statistic
indicates that there are few age-by-period interaction effects and thus
little cohort variation. If the model suggests significant age-by-period
interaction effects, one may proceed to examine inter- and intra-cohort
differences^{1}. Inter-cohort average deviations are quantified based on
the arithmetic mean of the age-by-period interaction terms contained in
each cohort and a t test can be used to examine the average of that
cohort-specific deviation. To investigate intra-cohort dynamics over the
life course (e.g., the cumulative (dis)advantage hypothesis in Dannefer 1987; Chauvel et al. 2016; Ferraro and Morton 2018; O’Brien 2020),
one may use a t-test of the linear orthogonal polynomial contrast in
each cohort’s age-by-period interaction effects. This intra-cohort
life-course dynamics test is helpful for investigating whether the
average (dis)advantages of the members of that cohort accumulate, remain
stable, or diminish as they age.

The APC-I model has three advantages. First, it is identified in that it does not require additional constraints other than the usual coding scheme. That is, it avoids the identification problem of the APC accounting model based on the theoretical account of cohort effects and allows inclusion of other important predictors such as education, sex, and employment status. Second, the interpretation of the coefficient estimates of the APC-I model is meaningful and straightforward. This is because the APC-I model recognizes the dependence of age, period, and cohort so the dilemma that analysts face using traditional APC models does not apply. Third, the APC-I model permits investigating life-course dynamics as a cohort ages, whereas extant methods usually assume that cohort effects do not change.

It is important to note that the APC-I model is never intended to "solve" the identification problem in traditional APC accounting models because it is a false problem to begin with. Given the near monopoly of the accounting model, it may be challenging not to see the APC-I method through the lens of the traditional APC accounting framework. For example, because the APC-I model quantifies cohort effects as a structure of the age-by-period interactions, some readers may take it to mean that the APC-I model cannot estimate "linear cohort main effects". However, the APC-I method, by design, does not intend to estimate any kind of "linear cohort main effects" precisely because the traditional model’s assumption that there is a linear cohort effect that is additive or independent of age and period effects lacks theoretical grounding and is thus arbitrary and questionable. Please see Luo and Hodges (2020b) for a more thorough discussion about the theoretical motivation of the APC-I model.

Because the APC-I mode is relatively new, below we make additional remarks about interaction effects and coding schemes to help readers better understand and use the model.

In some cases, interaction terms may be difficult to interpret besides suggesting that the effect of one variable may depend on the values of the other. However, as explained by Luo and Hodges (2020b), the age-by-period interaction terms correspond to the conceptual definition of cohort effects and thus can be modeled and interpreted in a meaningful way. Specifically, cohort effects are expected when the influence of social events and changes differ by age groups. This conceptualization of cohort effects implies that the age-by-period interactions, which represent the differential effects of time periods depending on age, can be used to measure cohort effects.

Technically, because of the linear dependency among age, period, and cohort, the effects of the third variable can be expressed as the interaction between the other two variables. The APC-I model considers age and period as main effects and cohort their interactions, which may give the impression that it privileges age and period effects and “discriminates” against cohort effects. The theoretical motivation for this choice is that it is often desirable to estimate a general age pattern that individuals follow as they get older. Period main effects are used to represent the kind of impacts of social environment that everyone in the society is exposed to. The decision to quantify cohort effects as a specific form of age-by-period interaction is informed by the demographic literature on how cohort effects are conceptualized in relation to age and period effects. Empirically, as the analysis of women’s labor force participation in section Examples will illustrate, the size of the cohort effects, characterized as the age-by-period interactions, is not necessarily smaller—in fact may be larger—than some of the main effects.

For the unidentified APC accounting model, some estimation methods including the intrinsic estimator yield effect estimates that are dependent on the choice of coding schemes in that estimates under different coding schemes are not equivalent (Grotenhuis et al. 2016 for a more detailed discussion; see Luo et al. 2016). The APC-I model does not have a rank deficiency problem in the sense that it does not require more constraints than a usual ANOVA model with main effects and their interactions. For any identified model including the APC-I model, the estimates are equivalent; that is, the estimated cell means are the same for all coding schemes.

Although this equivalence holds for both main effects and interaction estimates in the APC-I model, it is less obvious for interaction terms because the interpretation of the specific parameter estimates do change with coding schemes. To illustrate, consider an example of applying the APC-I model to health data with three age categories and three periods, shown in 1 below. Under dummy coding—for example, the youngest age group 20-24 and the beginning survey period of 2000 are set to zero or omitted as the referent—the interaction for ages 25-29 and period 2005 in cell Y represents the difference in a health outcome between periods 2000 and 2005 for age 25-29 or equivalently, health difference between ages 25-29 and 20-24 for the period 2005. That is, interactions under dummy coding represents a directional difference from a particular reference group.

2000 | 2005 | 2010 | ||

20-24 | ||||

25-29 | \(\mathrm{Y}\) | |||

30-34 |

By contrast, under effect coding (i.e., sum-to-zero coding), the same interaction term in cell \(Y\) represents the deviation in the health outcome from the age main effect plus period main effect for the group of individuals who were age 25-29 and surveyed in period 2005.

The estimated interaction terms under these two types of coding in cell \(Y\) thus necessarily have different numerical values. However, this difference does not arise from an identification problem but rather from a shift in what these quantities represent. That is, the two interaction terms can be transformed to be equivalent so that the means in \(Y\) after considering age and period main effects are the same under the two coding schemes.

We recommend using effect or sum-to-zero coding for estimating the APC-I model for the following reasons: when characterizing cohort effects as a set of age-by-period interactions, we are less concerned about any direction of the interactions; that is, we are not particularly interested in the difference between two cohorts at a particular age or time period. Rather, we focus on particular structures of these interactions that may represent theoretically interesting patterns during a cohort’s life span. Effect coding is helpful because they all have the same referent group — the next lower level in the hierarchy of main effects and interactions. That is, we choose effect coding for the purpose of easy interpretation. This is also consistent with the recommendation of coding schemes in the presence of interactions (Aiken et al. 1991; Jaccard and Turrisi 2003).

The R package *APCI* ^{2} can
be installed and loaded using the following R code^{3}:

```
# install R package APCI
> install.packages("APCI")
# load R package APCI to the current working environment
> library(APCI)
```

The main routines to implement the APC-I model using package
*APCI* are `apci.plot.raw`

,
`apci`

, `apci.plot`

(or `apci.plot.hexagram`

, `apci.plot.heatmap`

). A
summary of these functions and input arguments used in the routines are
described below.

The R package *APCI* contains
the following functions for estimating the APC-I model and visualizing
the data and the model results:

`apci`

: to estimate the age, period, and cohort effects using the APC-I model.`temp_model`

: an internal function that estimates a generalized linear model.`tests`

: to conduct the global F test.`maineffect`

: an internal function to extract age and period main effects.`cohortdeviation`

: an internal function to extract between-cohort average deviations and within-cohort life-course dynamics.`ageperiod_group`

: to return a cohort index based on how age and period are grouped.`apci.plot.raw`

: to visualize the mean values of the outcome across age and period groups, respectively.`apci.plot.hexagram`

: to visualize the estimated cohort effects in a hexagram style.`apci.plot.heatmap`

: to visualize the estimated cohort effects in a heatmap style.`apci.plot`

: to visualize the estimated age, period, and cohorts in conventional figures.

A summary of input arguments required in these functions will be given
one by one ^{4} in the next section. Package
*APCI* also contains three
empirical datasets *women9017*, *cpsmen*, *cpswomen*, and one simulated
dataset *simulation*. Dataset *women9017* was used and described in
Luo and Hodges (2020b). Applications of the APC-I model
to the other two empirical datasets are given in section
Examples.

Function `apci`

is the core function in the R package
*APCI*. It fits an APC-I
model with or without covariates and returns a list of results including
coefficients and standard error estimates for age main effects, period
main effects, inter-cohort average deviations, and intra-cohort
life-course trends, and covariate coefficients if any. Both pooled
cross-sectional data and multi-cohort longitudinal/panel data are
supported. Specifically, function `apci`

is used as

```
apci(data, outcome, age, period, cohort, weight, covariate, family,
dev.test=TRUE, print, gee, id, corstr,...)
```

and takes the following arguments:

`data`

: a data frame containing an outcome variable, age group indicators, period group indicators, and covariates to be used in the model. If a variable is not found in data, there will be an error message reminding users to check the input data again. Supported data structures include pooled cross-sectional data and multi-cohort longitudinal/panel data.`outcome`

: an object of class character containing the name of the outcome variable. The outcome variable can be a continuous, categorical, or count variable.`age`

: an object of class character indicating the age group index taking on the number of distinct values in the data (e.g., six age groups: 20-24, 25-29, 30-34, 35-39, 40-44, and 45-49). The vector should be a factor (or “category”, or “enumerated type”).`period`

: an object of class character indicating the time period index in the data.`cohort`

: an optional object of class character indicating cohort membership index in the data. The cohort index can be generated from the age group index and time period index in the data because of the exact linear relationship among these three time-related indices.`weight`

: an optional vector of sample weights to be used in the model fitting process. If`non-NULL`

, user-supplied weights will be used in the first step to estimate the model. Observations with negative weights will be dropped in modeling.`covariates`

: an optional vector of characters containing the names of user-specified covariate(s) to be used in the model. If the variables are not found in data, there will be an error message reminding the users to check the data again.`dev.test`

: logical, specifying if the global F test (step 1) should be implemented before fitting the APC-I model. If`TRUE`

,`apci`

will first run the global F test and report the test results; otherwise,`apci`

will skip this step and return`NULL`

. The default setting is`TRUE`

. However, users should be aware that the algorithm will not automatically stop even if there is no significant age-by-period interactions based on the global F test.^{5}`family`

: a character string specifying the link function to be used in the model. The value can be “binomial”, “multinomial”, or “gaussian”. See R function`glm`

for more details about link functions.`print`

: logical, specifying if the intermediate results should be displayed in the console when fitting the model. The default setting is`TRUE`

to display the results of each procedure.`gee`

: logical, indicating if the data is cross-sectional data or longitudinal/panel data. If`TRUE`

, the generalized estimating equation will be used to correct the standard error estimates. The default is`FALSE`

, indicating that the data are cross-sectional.`id`

: a character vector specifying the cluster index in longitudinal data. It is required when`gee`

is`TRUE`

. The length of the vector should be the same as the number of observations.`corstr`

: a character string specifying a possible correlation structure in the error terms when`gee`

is`TRUE`

. The following are allowed:`independence`

,`fixed`

,`stat_M_dep`

,`non_stat_M_dep`

,`exchangeable`

,`AR-M`

and`unstructured`

. The default value is`exchangeable`

.`unequal_interval`

: logical, indicating if age and period groups are of the same interval width. The default is set as`TRUE`

.`age_range, period_range`

: numeric vectors indicating the actual age or period range (e.g., 10 to 59 years old or from 2000 to 2019).`age_interval, period_interval, age_group, period_group`

: numeric values or character vectors indicating how age and period are grouped.`age_interval`

and`period_interval`

indicate the width of age and period intervals, respectively.`age_group`

and`period_group`

are character vectors listing possible age and period groups. There are two ways to define age and period groups with unequal intervals: 1) defining`age_interval`

and`period_interval`

, or 2) defining`age_group`

and`period_group`

. Users must define age and period groups using one of the two options when`unequal_interval`

is`TRUE`

.`...`

: further optional arguments to be passed to the model.

As mentioned in section Coding scheme and contrast, we use
effect coding to estimate the APC-I model. The `age`

and `period`

arguments in function `apci`

accept categorical variables. Different
from the common approach of dummy coding or simple coding, where an
effect is defined as the difference of each group from the reference
group, function `apci`

uses effect coding (i.e., the sum-to-zero coding,
deviation coding, or the ANOVA coding) as the default coding. Under this
coding scheme, the effect of the omitted category equals the negative
sum of the effects of all other categories. Computation wise, the effect
coefficient of the omitted category is redundant because of the coding
scheme. However, for the purpose of quantifying cohort effects as
deviations from the main effects of age and period, it is useful to
compute estimates for all age-by-period cross-classifications and their
standard errors. For data with *A* age groups and *P* periods,
therefore, function `apci`

returns \(A\) and \(P\) number of main effect
estimates and \(A*P\) number of interaction estimates along with their
standard error estimates. For the main effects of age or period, the
estimate can be interpreted as the deviation associated with each age or
period group from the global mean. The age-by-period interactions
represent the deviation from the expectation determined by the
corresponding age and period main effects.

Also note that when age and period groups have unequal interval widths
in an age-period classification table, the age-by-period interactions
contained in a cohort no longer lie on the same diagonals. Because
cohort effects are conceptualized and estimated as a structure of the
age-by-period interactions in the APC-I model, it is technically
possible to use the argument `unequal_interval`

in package
*APCI* to extract interaction
coefficient estimates that lie on different diagonals. However, unequal
age and period group intervals may complicate the issue of cohort
overlapping noted by Kupper et al. (1985). For this reason, we do
not recommend using unequal interval widths for age and period groups if
possible.

After fitting the APC-I model, function `apci`

will store the following
components as a list for further usage:

`model`

: a summary of the fitted generalized linear regression model.^{6}It displays the standard regression output including coefficient and standard errors estimates.`dev_global`

: the global F test results. It examines if the interaction terms are significant in a generalized linear regression model that contains age and period main effects and their interactions.`intercept`

: the overall intercept (\(\mu\) in equation (2)).`age_effect`

: a vector containing the estimated effect for each age group.`period_effect`

: a vector containing the estimated effect for each time period.`cohort_average`

: a vector containing the inter-cohort average deviations for comparing differences between cohorts.`cohort_slope`

: a vector containing intra-cohort life-course trends.

As mentioned earlier, the first step of implementing the APC-I model is
to conduct a global F test of the age-by-period interactions. This step
is a routine in function `apci`

, but the procedure does not stop even if
there is no statistically significant deviation from the age main effect
and period main effect. Therefore, we recommend separately conducting
the global F test. In R package
*APCI*, the function `tests`

can be used for this purpose. It can be used as follows:

`tests(model, A, P, C, ...) `

and takes the following arguments:

`model`

: a generalized linear regression model generated from the internal function`temp_model`

.^{7}`A, P, C`

: numbers of age groups, period groups, or cohort groups. If age and period groups are of different widths, the values of will be automatically generated by the function.

Function `tests`

will return a standard F test result including the
value of the F test statistic and the associated p-value.

In package *APCI*, we provide
four functions to facilitate visualizing the data and model results,
namely `apci.plot.raw`

, `apci.heatmap`

, `apci.plot.hexagram`

, and
`apci.plot`

, in different stages of a research project. They take
similar input arguments. A summary of these arguments is given below.

Function `apci.plot.raw`

is designed to plot the outcome variable
aggregated by age or period groups. This function may be used in the
stage of data exploration. Functions `apci.heatmap`

and
`apci.plot.hexagram`

are designed to plot the age-by-period interactions
from the APC-I model. Both functions generate heatmaps, where one axis
represents age groups, and the other period groups. The cells in a
diagonal represent one cohort. The difference between the two functions
is the layout of the heatmap; one is a rectangular graph and the other a
hexagram. Function `apci.plot`

can be used to visualize both raw data
and model results. It divides the canvas into four (\(2 \times 2\))
panels. Three of the four panels can be used to visualize the three
effect estimates in the APC-I model and the left panel to add notes. For
data exploration, users can visualize the mean values of the outcome
variable across age, period, and cohort groups on the same canvas. Users
can the same functions to visualize the estimated age, period, and
cohort effects from the modeling results.

The visualization functions in package
*APCI* include:

```
apci.plot.raw(data, outcome_var, age, period, ...)
apci.plot.heatmap(model, age, period, color_map = NULL, color_scale = NULL,
quantile = NULL, ...)
apci.plot.hexagram(model, age, period, first_age, first_period, interval,
color_scale = NULL, color_map = NULL, quantile = NULL, …)
apci.plot(model, age, period, outcome_var, type = "model", quantile = NULL,
...)
```

and takes the following arguments:

`model`

: a list recording the results from function`apci`

.`outcome_var`

: an object of class character indicating the name of the outcome variable used in the model. The outcome variable can be a continuous, binary, categorical, or count variable.`age`

: a vector indicating the age group. The vector should be converted to a factor (or the terms of “category” and “enumerated type”).`period`

: a vector indicating the time period. The vector should be converted to a factor (or “category”, “enumerated type”).`color_map`

: a vector representing a color palette to be used in the figure. The default setting is greys if`color_map`

is`NULL`

. Alternatives, for example, can be c(“blue”, “yellow”), “blues”, etc.`cohort_scale`

: a vector containing two values to indicate the minimum and maximum values, respectively, of the estimated cohort effects to be displayed. If`NULL`

, the function will use the range from the estimation results.`quantile`

: a number valued between 0 and 1, representing the desirable percentiles to be used in visualizing the data or model. If`NULL`

, the original scale of the outcome variable will be used.

We now illustrate how to use package
*APCI*’s visualization and
analytical functions. We describe and analyze two empirical datasets to
demonstrate how this package may be used to analyze pooled
cross-sectional data. We later briefly describe how to fit an APC-I
model for multi-cohort longitudinal data.

Temporal trends in men’s and women’s labor force participation (LFP) in United States have gathered much scholarly attention (Farkas 1977; Treas 1987; Wilkie 1991; see e.g. Connelly 1992; Macunovich 2012; Hollister and Smith 2014). For example, whereas men’s LFP steadily declined in the past decades (Wilkie 1991), women’s LFP continued to rise until the 1990s and the 2000s. Female LFP has since then reached a plateau and even begun to decline. Researchers have debated about the causes of this leveling off or decline. Some studies attributed the observed trends to period-specific factors such as labor demand (Erceg and Levin 2014), the economic shocks of the Great Recession (Boushey 2005; Hoffman 2009), social welfare and disability insurance (Duggan and Imberman 2009), and gender role attitudes (Fortin 2015).

However, the temporal trends in LFP are unlikely due to a pure period process. For example, because individuals may begin to leave the labor force in age 50, a decline in LFP should be expected if the proportion of the population age 50 or older has increased. That is, the recent trends may reflect a change in the age composition of the US population (Aaronson et al. 2014). The cohort succession may also contribute to the trend, a process in which older cohorts with higher or lower LFP rates begin to decease and younger cohorts with lower or higher LFP enter the labor force Lee (2014). At the same time, critical social and demographic changes in education level, fertility, and attitudes about women working outside the home may be more of cohort-specific than period-specific processes because these forces only affect individuals of certain ages (Goldin 2006; Farré and Vella 2013; Fernández 2013; Balleer et al. 2014).

Given that the observed trends in LFP are likely a mix of age, period,
and/or cohort patterns, an APC analysis that decomposes the temporal
trends into age-, period-, and cohort-related variations is thus helpful
for revealing the demographic, social, and economic changes that have
underlined the temporal trends in American’s labor force participation.
In this following section, we demonstrate how to use the functions in
*APCI* to undertake an APC
analysis of men’s and women’s LFP using a cross-sectional dataset.

The Current Population Survey
(CPS, Flood et al. 2021) is the primary source of labor force statistics in the
United States (Flood et al. 2021). Beginning in the 1960s, the CPS has been
collecting data on key demographic, economic, and education topics. We
subset the 1990-2019 CPS data by gender, resulting in two datasets,
namely *cpsmen* and *cpswomen*, to show how to conduct an APC-I analysis
of men’s and women’s LFP in the United States using package
*APCI*.

Datasets *cpsmen* and *cpswomen* contain a subset of men and women age
20-64 who participated in the 1990 to 2019 CPS. The following code is
used to load the data into the working environment:

```
> data(cpsmen)
> data(cpswomen)
```

The first five rows of the datasets of *cpsmen* and *cpswomen* are:

```
> head(cpsmen, n = 5)
asecwt year age labforce educc2854.84 3 5 0 1
1576.54 4 4 1 2
2340.55 2 6 1 3
158.44 3 5 0 0
347.09 6 6 1 3
> head(cpswomen, n = 5)
asecwt year age labforce educc2415.67 2 3 1 1
663.89 2 5 1 3
1653.01 6 4 1 2
1613.31 6 4 0 2
177.23 4 3 1 2
```

where `labforce`

indicates the respondent’s labor force participation
status (1=in the labor force, 0=not in the labor force). `asecwt`

is the
person-level weight that the CPS recommends to be used in
individual-level data analyses. `year`

indicates the survey year when
respondent was interviewed, grouped into 6 period groups (1=1990-94,
2=1995-99, …, 6=2015-19). `age`

indicates the respondent’s age
categories (1=20-24, 2=25-29, …, 9=60-64). `educc`

is a three-level
categorical education measure (1=less than high school, 2=high school
graduate, 3=college degree or above).

For data exploratory purpose, function `apci.plot.raw`

visualizes the
outcome variable in the following way:

`> apci.plot.raw(data, outcome_var, age, period)`

Figure 1 shows LFP rates by age groups (top panel) and period groups (bottom panel), respectively, for male (left panel) and female CPS respondents (right panel) age 20 to 64 from 1990 to 2019. Figure 1’s top panel suggests similar age patterns in LFP across time periods. The bottom panel shows distinct period trends depending on age groups. For example, the LFP rates among women in the 55-59 and 60-64 age groups seem to have gone up whereas other age groups show a relatively flat trend. Such distinct period patterns in LFP by age groups suggest potential cohort variations in women’s labor force participation. For men’s LFP, however, the visualization results suggest that a simpler model with age and period main effects may suffice for summarizing their LFP patterns.

Function `apci`

can be used to fit an APC-I model for pooled
cross-sectional data or multi-cohort longitudinal/panel data. In the
simplest form of an APC-I model without covariates for pooled
cross-sectional data, function `apci`

is called as follows:

```
> no_cov <- APCI::apci(outcome = "labforce",
+ age = "age",
+ period = "year",
+ weight = "asecwt",
+ data = cpswomen,
+ dev.test = FALSE,
+ family = "binomial")
```

It is often desirable to add covariates in the model, which can be done
by calling the `covariate`

argument. For example, suppose one would like
to add education levels (“educc”) as a covariate in the model, function
`apci`

can be used as:

```
> with_cov <- APCI::apci(outcome = "labforce",
+ age = "age",
+ period = "year",
+ covariate = c("educc"),
+ weight = "asecwt",
+ data = cpswomen,
+ print=F,
+ dev.test=FALSE,
+ family = "binomial")
```

Below is a summary of the results from an APC-I model that includes education levels as a covariate:

```
> summary(with_cov)
Length Class Mode 33 svyglm list
model 0 -none- NULL
dev_global 4 -none- character
intercept 45 -none- character
age_effect 30 -none- character
period_effect 6 data.frame list
cohort_average 6 data.frame list
cohort_slope 5 data.frame list
int_matrix 54 -none- numeric
cohort_index 7 data.frame list data
```

The returned value is a list of objects. `model`

contains the results
from a logistic regression model with age and period main effects and
the unstructured interactions. `dev_global`

displays the global F test
result. A significant F statistic suggests that there may exist cohort
effects. `intercept`

is the overall intercept (\(\mu\) in Equation
(2)). `age_effect`

gives estimated age main effect.
`period_effect`

is the estimated period main effect. `cohort_average`

gives inter-cohort average deviations from age and period main effects.
`cohort_slope`

gives intra-cohort life-course linear slopes, which can
be used for testing intra-cohort life-course dynamics. `int_matrix`

displays a matrix that contains the estimated coefficients for
age-by-period interactions. Note that there are A*P interactions in
`int_matrix`

because effect coding is used to compute the A+P-1
interaction estimates based on the (A-1)*(P-1) freely varying
interaction parameters. Such interaction estimates are used to generate
heatmaps similar to Figure 2. `data`

stores the data fed into `apci`

function. Users may call an object to obtain detailed results. For
example, by calling `with_cov$cohort_average`

and
`with_cov$cohort_slope`

, users can obtain estimated inter-cohort average
deviations and intra-cohort life-course slopes.

The output below shows education-adjusted inter-cohort average
deviations in women’s LFP from analyzing the *cpswomen* data using
function `apci`

.

```
> with_cov$cohort_average
c_avg_group c_avg_est c_avg_se c_avg_t c_avg_p c_avg_sig1 1 -0.329 0.193 -1.709 0.088
2 2 -0.155 0.142 -1.091 0.275
3 3 -0.162 0.114 -1.422 0.155
4 4 0.047 0.097 0.481 0.631
5 5 0.096 0.085 1.139 0.255
6 6 0.174 0.076 2.288 0.022 *
7 7 0.034 0.074 0.457 0.648
8 8 -0.036 0.074 -0.493 0.622
9 9 0.003 0.073 0.047 0.963
10 10 -0.072 0.081 -0.894 0.371
11 11 0.030 0.085 0.353 0.724
12 12 -0.029 0.102 -0.288 0.774
13 13 -0.080 0.131 -0.609 0.543
14 14 -0.103 0.170 -0.608 0.543
```

where `c_avg_group`

indicates cohort membership (e.g., cohort 1=the 1930
birth cohort, cohort 2=the 1935 birth cohort,...,cohort 14=the 1995
birth cohort), `c_avg_est`

is inter-cohort average deviation, `c_avg_se`

is the standard error estimate for the average deviation, `c_avg_t`

is
the t test statistic for the average deviation, and `c_avg_p`

and
`c_avg_sig`

are the p values and alpha levels (*: p < .05, **: p <
.01, and ***: p < .001), respectively.

The results from `with_cov$cohort_average`

imply that on average, the
LFP rates among cohort 6’s–the 1955 birth cohort – significantly
differ from the expected rates based on age and period main effects.
Specifically, the 1955 cohort shows a .19 (exp(.174)-1, p < .05) higher
participation rate than the expectation based on the age and period main
effects.

The output below shows education-adjusted intra-cohort life-course
dynamics in women’s LFP from analyzing the *cpswomen* data using
function `apci`

.

```
> with_cov$cohort_slope
c_slp_group c_slp_est c_slp_se c_slp_t c_slp_p c_slp_sig1 1 NA NA NA NA <NA>
2 2 0.165 0.195 0.849 0.396
3 3 -0.215 0.187 -1.148 0.251
4 4 0.163 0.189 0.866 0.386
5 5 0.093 0.184 0.508 0.611
6 6 0.007 0.169 0.039 0.969
7 7 0.047 0.172 0.277 0.782
8 8 -0.096 0.181 -0.530 0.596
9 9 -0.187 0.173 -1.076 0.282
10 10 -0.106 0.176 -0.602 0.547
11 11 -0.279 0.159 -1.750 0.080
12 12 0.353 0.160 2.207 0.027 *
13 13 -0.047 0.180 -0.262 0.793
14 14 NA NA NA NA <NA>
```

where `c_slp_group`

indicates cohort membership (e.g., cohort 1=the 1930
birth cohort, cohort 2=the 1935 birth cohort,..., cohort 14=the 1995
birth cohort), `c_slp_est`

is intra-cohort life-course slopes,
`c_slp_se`

is the standard error estimate for the life-course slope,
`c_slp_t`

is the t test statistic for the life-course slope, and
`c_slp_p`

and `c_slp_sig`

are the p values and alpha levels (*: p <
.05, **: p < .01, and ***: p < .001), respectively. `NA`

s are
generated for the youngest and oldest cohort because there is only one
age-by-period combination observed for the two cohorts and thus
intra-cohort life-course dynamics cannot be accessed.

For example, for cohort 12 (the 1985 birth cohort), the estimated intra-cohort slope is 0.353 (p < .05), meaning that this cohort’s LFP is lower than expected when they were young but higher than expected in older ages. Interestingly, for cohort 12 (the 1985 birth cohort), their average cohort deviation is not statistically significant. Such an insignificant inter-cohort average deviation and a significantly negative intra-cohort slope indicate a compensation life-course pattern; that is, this cohort’s lower-than-expected LFP in younger ages seems to be compensated by their higher LFP when they were older.

The intra-cohort life-course dynamics are based on the age-by-period interactions as follows:

```
# the first six rows of the life-course dynamics
> with_cov$int_matrix
iaesti iase iap iasig cohortindex1 0.166 0.169 0.327 9
2 -0.048 0.207 0.818 8
3 0.068 0.164 0.678 7
4 0.095 0.172 0.581 6
5 0.205 0.191 0.283 5
6 0.227 0.193 0.239 4
# [there are 48 rows compressed]
```

where “iaesti” is the age-by-period interaction estimates, “iase” is the standard error estimate for the interaction term, “iap” and “iasig” are the p value and alpha level (*: p < .05, **: p < .01, and ***: p < .001),respectively, and “cohortindex” indicates cohort membership.

The following code can be used to organize the intra-cohort life-course estimates in a matrix form:

```
> matrix(with_cov$int_matrix, A, P)[A:1,]
# A is the number of age groups and P is the number of period groups.
#1 period #2 period #3 period #4 period #5 period #6
period #9 -0.329 -0.038 -0.416* 0.334 0.267 0.182
age #8 -0.272 0.043 0.017 0.125 0.001 0.086
age #7 -0.112 -0.392* -0.070 0.257 0.278 0.040
age #6 0.227 -0.046 0.443* -0.333* -0.258 -0.033
age #5 0.205 0.066 0.061 -0.150 -0.074 -0.107
age #4 0.095 0.044 0.139 0.065 -0.109 -0.234
age #3 0.068 0.059 -0.359* -0.147 0.096 0.283
age #2 -0.048 0.256 -0.006 0.067 -0.156 -0.113
age #1 0.166 0.009 0.191 -0.216 -0.046 -0.103 age
```

Based on the R package
*ggplot2* (Wickham 2016),
heatmaps can be generated to visualize inter- and intra-cohort patterns
and motivate a subsequent formal APC analysis. For example, for dataset
*whitemen*, both inter-cohort average deviations and intra-cohort
life-course dynamics may be visualized in a heatmap as follows:

```
> apci.plot.heatmap(model = with_cov, age = "age",period = 'year',
color_map = c('blue','yellow'))
```