unival: An FA-based R Package For Assessing Essential Unidimensionality Using External Validity Information

The unival package is designed to help researchers decide between unidimensional and correlated-factors solutions in the factor analysis of psychometric measures. The novelty of the approach is its use of external information, in which multiple factor scores and general factor scores are related to relevant external variables or criteria. The unival package’s implementation comes from a series of procedures put forward by Ferrando and Lorenzo-Seva (2019) and new methodological developments proposed in this article. We assess models fitted using unival by means of a simulation study extending the results obtained in the original proposal. Its usefulness is also assessed through a real-world data example. Based on these results, we conclude unival is a valuable tool for use in applications in which the dimensionality of an item set is to be assessed.

Pere J. Ferrando (Department of Psychology, University Rovira i Virgili) , Urbano Lorenzo-Seva (Department of Psychology, University Rovira i Virgili) , David Navarro-Gonzalez (Department of Psychology, University Rovira i Virgili)

1 Introduction

Assessing the dimensionality of a set of items is one of the central purposes of psychometric factor analysis (FA) applications. At present, both the exploratory (EFA) and the confirmatory (CFA) models can be considered to be fully developed structural equation models (Ferrando and Lorenzo-Seva 2017). So, in principle, dimensionality can be rigorously assessed by using the wide array of goodness-of-fit procedures available for structural models in general. However, it is becoming increasingly clear that reliance on goodness-of-fit alone is not the way to judge the most appropriate dimensionality for studying a particular set of item scores (Rodriguez et al. 2016b,a).

The problem noted above is particularly noticeable in instruments designed to measure a single trait. In the vast majority of cases, item scores derived from these instruments fail to meet the strict unidimensionality criteria required by Spearman’s model. This failure, in turn, led to the proposal of multiple correlated-factor solutions as the most appropriate structure for them (Ferrando and Lorenzo-Seva in press, 2018; Furnham 1990; Reise et al. 2013, 2015). However, most instruments designed to be unidimensional do, in fact, yield data compatible with an essentially unidimensional solution (Floyd and Widaman 1995; Reise et al. 2013, 2015). When this is the case, treating the item scores as multidimensional has several undesirable consequences, mainly, (a) lack of clarity in the interpretation and unnecessary theoretical complexities, and (b) weakened factor score estimates that do not allow accurate individual measurement (Ferrando and Lorenzo-Seva in press, 2018; Furnham 1990; Reise et al. 2013, 2015). Indeed, treating clearly multidimensional scores as unidimensional also has such negative consequences as biased item parameter estimates, loss of information, and factor score estimates that cannot be univocally interpreted (Reise et al. 2013; see Ferrando and Lorenzo-Seva 2018).

In recent years, several indices and criteria have been proposed to assess dimensionality using different perspectives of model appropriateness. These developments, in turn, have been integrated in comprehensive proposals addressing the dimensionality issue from multi-faceted views including, but are not limited to, standard goodness-of-fit results (Rodriguez et al. 2016b,a; Ferrando and Lorenzo-Seva 2018; Raykov and Marcoulides 2018). It is worth noting these approaches generally reflect a trend in which the measurement part of the FA model is again relevant (e.g. Curran et al. 2018). Considering the ultimate aim of most psychometric measures is individual measurement, the scoring stage of the FA should be expected to be the most important part of it (Ferrando and Lorenzo-Seva in press, 2018). Furthermore, if this view is adopted, a basic criterion for deciding if a given FA solution is appropriate is the extent to which the score estimates derived from this solution are strong, reliable, determinate, unbiased, and clearly interpretable (Furnham 1990; Reise et al. 2013, 2015; Beauducel et al. 2016; Ferrando and Lorenzo-Seva 2018). Procedures explicitly based on the quality of the score estimates are already available in widely used programs such as FACTOR (Lorenzo-Seva and Ferrando 2013), and more sophisticated procedures based on Haberman (2008)’s ((2008)) added-value principle have been also proposed (Ferrando and Lorenzo-Seva in press).

A common characteristic of all the proposals discussed so far is their use of internal information from the data exclusively: that is to say, the information provided by the item scores of the measure under study. In contrast, the approach implemented here is based on external sources of information: that is to say, the information provided by the relations between the factor score estimates derived from a given solution and relevant external variables or criteria. This additional information is a valuable complementary tool that can help reach a decision on whether the instrument under scrutiny is essentially unidimensional or truly multidimensional.

The present article aims to introduce unival, a new contributed R package implementing a recently proposed external procedure of the type described above (Ferrando and Lorenzo-Seva 2019). It also discusses new methodological developments allowing the procedure to be used in a wider range of situations than those considered in the original proposal. The rest of the article is organized as follows. First, we provide a summary the needed theoretical bases, and explain the new methodological contributions. Then, we give details about the package and how to use it. Finally, we assess the functioning of the program and the new developments proposed here with a simulation study and give real-world data examples.

2 Theoretical foundations: A review

Consider two alternative FA solutions – unidimensional and multiple-correlated – which are fitted to a set of item scores. Suppose further that both solutions are found to be acceptable by internal criteria, a situation which is quite usual in applications (e.g. Ferrando and Navarro-Gonzalez 2018). The aim of the proposal summarized here is to assess which of the competing solutions is more appropriate in terms of external validity.

The null hypothesis in the proposal assumes (a) there is a general common factor running through the entire set of items, and (b) all the relations between the multiple factors and the relevant external variables are mediated by the general factor. In this case, the unidimensional solution is the most appropriate in terms of validity. At this point we note the proposal is intended to work on a variable-by-variable basis. So, it will be summarized using a single external variable.

The null hypothesis above can be described by using a second-order FA schema as follows. Assumption (a) above implies the correlated factors in the multiple solution, which we shall denote from now on as primary factors, behave as indicators of a single general factor. Assumption (b) implies the only parts of the primary factor not accounted for by the general factor are unrelated to the external variable.

The implications of the null model in terms of validity relations are considered in two facets: differential and incremental. In differential validity terms, the score estimates derived from the primary factors are expected to be related to the external variable in the same way as they are related to the general factor. As for incremental validity, the implications of the null model are the prediction of the external variable which is made from the single (general) factor score estimates cannot be improved upon by using the primary factor score estimates in a multiple regression schema.

Let \(\hat{\theta}_{ik}\) be the factor-score estimate of individual \(i\) in the \(k\) primary factor, and let \(\theta_{ik}\) be the corresponding true factor score. We write \[\label{EQ1} \hat{\theta}_{ik} = \theta_{ik} + \varepsilon_{ik} , \tag{1}\] where \(\varepsilon_{ik}\) denotes the measurement error. The true scores \({\theta}_{k}\) are assumed to be distributed with zero expectation and unit variance. It is further assumed \(\hat{\theta}_{ik}\) is conditionally unbiased (i.e. \(E(\hat{\theta}_{ik}|\theta_{ik})=\theta_{ik}\), which implies the measurement errors are uncorrelated with the true trait levels. It then follows the squared correlation between \(\hat{\theta}_{k}\) and \(\theta_{k}\) is \[\label{EQ2} \rho_{(\hat{\theta}_{k}, \theta_{k})} = \frac{Var(\theta_{k})}{Var(\hat{\theta}_{k})} = \frac{1}{1+Var(\varepsilon_{k})} = \frac{1}{1+E(Var(\varepsilon_{ik}|\theta_{ik}))} = \rho_{(\hat{\theta}_{k},\hat{\theta}_{k})} \tag{2}\] which is taken as the marginal reliability of the factor score estimates (see Ferrando and Lorenzo-Seva in press). Denote now by \(y\) the external variable or criterion also assumed to be scaled with zero mean and unit variance and by \(\rho_{(\hat{\theta}_{k},y)}\) the correlation between the \(k_{th}\) factor score estimates and the criterion (i.e. the raw validity coefficient). From the results above it follows that the disattenuated correlation between the estimated primary factor scores and the criterion \[\label{EQ3} \hat{\rho}_{(\theta_{k},y)} = \frac{\rho_{(\hat{\theta}_{k},y)}}{\sqrt{\rho_{(\hat{\theta}_{k},\hat{\theta}_{k})}}} \tag{3}\] is an unbiased estimate of the corresponding correlation between the true primary scores and the criterion (i.e. the true validity coefficient). Now let \(\gamma_{kg}\) be the loading of the \(k\) primary factor on the general factor (i.e. the second-order loading). If the null model is correct, the following result should hold \[\label{EQ4} \frac{\hat{\rho}(\theta_{1},y)}{\gamma_{1g}} = \cdots \frac{\hat{\rho}(\theta_{k},y)}{\gamma_{kg}} = \cdots \frac{\hat{\rho}(\theta_{q},y)}{\gamma_{qg}} . \tag{4}\]

In words, equation (4) means the primary factors relate to the external variable in the same proportion to how they relate to the general factor. So, after correcting for this proportionality, the corrected indices should all be equal (i.e. no differential validity). To test this result, unival uses the following schema. First, it provides the Bootstrap-based confidence interval for each of the scaled coefficients in equation (4). Second, the median value of the scaled coefficients is obtained, and the most extreme scaled value is subtracted from the median. Next, a confidence interval for this difference is obtained via Bootstrap resampling, and a check is made to see whether the zero value falls within this interval or not. This second procedure provides a single difference statistic regardless of the number of primary factors.

If the equality test is found not tenable, then the alternative explanation (i.e. differential validity) is the unique parts of the primary factors are still differentially related to the external variable beyond the relations that are mediated by the general factor. If this were so, validity information would be lost if the unidimensional model was chosen instead of the multiple model.

We turn now to incremental validity. The starting point of the proposal by (Ferrando and Lorenzo-Seva 2019) was based on two results. First, the score estimates on the general factor are a linear composite of the score estimates on the primary factors in which the weights aim to maximize the accuracy of the general scores. And second, the multiple-regression composite, which is also based on the primary factor score estimates, has weights aimed at maximizing the correlation with the external variable. In a truly unidimensional solution both sets of weights are expected to be proportional, and the predictive power of the general score estimates and the primary score estimates to be the same. More in detail, (Ferrando and Lorenzo-Seva 2019) proposed correcting the primary factor score estimates for measurement error, and then obtained single and multiple corrected correlation estimates whose expected values were the same under the null model above. Under the alternative hypothesis, on the other hand, the corrected multiple correlation (denoted by \(R_c\) ) was expected to be larger than the single correlation based on the general scores (denoted by \(\hat{\rho}_{\hat{\theta}_{g}y}\)). The procedure implemented in unival for testing the null hypothesis of no incremental validity is to compute the difference \(R_c - \hat{\rho}_{\hat{\theta}_{g}y}\), obtain the Bootstrap confidence interval for this difference, and check whether the zero value falls within the interval or not. If the null hypothesis is rejected, the alternative explanation (i.e. incremental validity) is the primary score estimates contain additional information allowing the multiple prediction based on them to be significantly better than the prediction based only on the general scores.

3 New methodological contributions

The present article extends the original proposal by (Ferrando and Lorenzo-Seva 2019) in two directions. First, the procedure can now be correctly used with types of score estimate other than those considered initially. Second, an approximate procedure is proposed for testing essential unidimensionality against a solution in only two correlated factors.

As for the first point, the original proposal is based on factor score estimates behaving according to the assumptions derived from equation (1). Appropriate scores of this type are mainly maximum-likelihood (ML) scores, which, in the linear FA model are known as Bartlett (1937)’s ((1937)) scores (see Ferrando and Lorenzo-Seva in press for a discussion). However, other types of scores are in common use in FA applications. In particular, Bayes Expected-A-Posteriori (EAP) scores have a series of practical advantages in nonlinear FA applications (Bock and Mislevy 1982) and are, possibly, the most commonly used scoring schema for this type of solution. EAP scores, however, are always inwardly biased (i.e. regressed towards the mean) and so do not fulfill the basic assumptions on which the original procedure was based.

Simple adaptations and corrections of the existing procedures can be obtained by viewing the EAP scores as the result of shrinking the ML scores towards the zero population mean so the shrinkage factor is the marginal reliability (Bock and Mislevy 1982). By using this concept in the assessment of differential validity, it follows that the expected value of the raw correlation between the EAP score estimates for the \(k\) factor and \(y\) is given by \[\label{EQ5} E(r_{(\hat{\theta}_{kEAP},y)}) = \frac{\rho_{(\theta_{k},y)}}{\sqrt{1+E(Var(\varepsilon_{ik}|\theta_{ik}))}} \tag{5}\] Indeed, the conditional variances in the denominator of (5) are not known, because they are based on the ML unbiased estimates. However, as the number of items increases, the posterior distribution approaches normality (Chang and Stout 1993), and the posterior standard deviation (PSD) associated with the EAP estimate becomes equivalent to an asymptotic standard error (Bock and Mislevy 1982). So, for factors defined, say, by 8 or more items, the following correction is expected to lead to appropriate disattenuated validity coefficients \[\label{EQ6} \hat{\rho}_{(\theta_{k},y)} = r_{(\hat{\theta}_{kEAP},y)}\sqrt{1+E(PSD^{2}(\theta_{ik}))} . \tag{6}\]

For very short item sets, the PSDs can be noticeably smaller than the standard errors because of the additional information contributed by the prior. The strategy proposed in this case is first to approximate the amounts of information from the PSDs by using the approximate relation (Wainer and Mislevy 2000 74) \[\label{EQ7} PSD(\hat{\theta}) \cong \frac{1}{\sqrt{I(\hat{\theta}+1)}} \tag{7}\] and then to use the modified correction \[\label{EQ8} \hat{\rho}_{(\theta_{k},y)} = r_{(\hat{\theta}_{kEAP},y)}\sqrt{1+E(\frac{1}{I(\hat{\theta}_{ik})})}. \tag{8}\] Once the EAP-based disattenuated validity estimates have been obtained, they are used in the contrast (4) in the same way as those derived from the ML scores.

We turn now to incremental validity. If EAP scores are used, the corrected estimate based on the general factor score estimates (denoted by \(\hat{\rho}_{\hat{\theta}_{g}y}\)) can be obtained as \[\label{EQ9} \hat{\rho}_{(\theta_{g},y)} = r_{(\hat{\theta}_{gEAP},y)} s_{(\hat{\theta}_{gEAP})} (1+E(\frac{1}{I(\hat{\theta}_{ik})})) \tag{9}\] or, if the PSD approximation is used \[\label{EQ10} \hat{\rho}_{(\theta_{g},y)} = r_{(\hat{\theta}_{gEAP},y)} s_{(\hat{\theta}_{gEAP})} (1+E(PSD^{2}(\theta_{ik}))) \tag{10}\] where \(s_{(\hat{\theta}_{gEAP})}\) is the standard deviation of the EAP score estimates. As for the multiple estimate based on the primary factor scores (denoted by \(R_c\)), only the covariances between the score estimates and the criterion must be corrected when EAP estimates are used instead of ML estimates (see Ferrando and Lorenzo-Seva 2019). EAP-based unbiased estimates of these covariances can be obtained as \[\label{EQ11} \hat{C}ov_{\theta_{k},y} = Cov_{(\hat{\theta}_{kEAP},y)}[1+E(PSD^{2}(\theta_{ik}))] \tag{11}\] or, by using the PSD-to-Information transformation if the number of items is very small \[\label{EQ12} \hat{C}ov_{\theta_{k},y} = Cov_{(\hat{\theta}_{kEAP},y)}[1+E(\frac{1}{I(\hat{\theta}_{ik})})]. \tag{12}\] Once the vector with the corrected covariances has been obtained, the rest of the procedure is the same as when it is based on ML score estimates.

Overall, the basis of the proposal so far discussed is to: (a) transform the EAP scores so they (approximately) behave as ML scores; (b) transform the PSDs so they will be equivalent to standard errors, and (c) use the transformed results as input in the standard procedure. The transformations are very simple, and the proposal is expected to work well in practical applications, as the simulation study below suggests. However, unstable or biased results might be obtained if the marginal reliability estimate used to correct for shrinkage was itself unstable or biased, or if the PSDs were directly used as if they were standard errors and the contribution of the prior was substantial.

This approximate procedure is expected to be useful in practice, because in many applications decisions must be taken about using one or two common factors. The problem in this case is a second-order solution can only be identified with three or more primary factors, and so, the initial proposal cannot be used in the bidimensional case. An approximate approach, however, can be used with the same rationale as in the original procedure.

Consider two matrices of factor score estimates (either ML or EAP): an \(N\times2\) matrix containing the estimates obtained by fitting the correlated two-factor solution, and an \(N\times1\) matrix containing the score estimates obtained by fitting the unidimensional (Spearman’s) model to the item scores. Next, consider the following regression schemas in which the primary factor score estimates in the \(N\times2\) matrix are corrected for measurement error. The first regression is of the unidimensional score estimates on the corrected primary factor score estimates. The second is the regression of the criterion on the same corrected factor score estimates. Now, if the unidimensional solution is essentially correct in terms of validity, then the profiles of weights for predicting the general scores and those for predicting the criterion are expected to be the same except for a proportionality constant. Denoting by \(\beta_g1\) and \(\beta_g2\) the weights for predicting the general scores from the corrected primary estimates, and by \(\beta_y1\) and \(\beta_y2\) the corresponding weights for predicting the criterion, the contrast we propose for testing the null hypothesis no differential validity is \[\label{EQ13} \frac{\beta_g1}{\beta_y1}=\frac{\beta_g2}{\beta_y2} \tag{13}\] and is tested by using the same procedure as in equation (4).

With regards to incremental validity, the null hypothesis of essential unidimensionality indicates both linear composites will predict the criterion equally well. So, if we denote by \(y'_g\) the composite based on the \(\beta_g1\) and \(\beta_g2\) weights, and by \(y'_y\) the composite based on the \(\beta_y1\) and \(\beta_y2\) weights, the test of no incremental validity is based on the contrast \(r(y'_y,y)-r(y'_g,y)\), and is tested in the same way as the standard contrast above.

4 The unival package details

The current version of the (unival) package, which is available through CRAN, contains one main function (and additional internal functions) for implementing the procedures described in the sections above. Further details on the theoretical bases of unival are provided in (Ferrando and Lorenzo-Seva 2019). The function usage is as follows.

unival(y, FP, fg, PHI, FA_model = 'Linear', type, SEP, SEG, relip, relig, 
percent = 90, display = TRUE)

The data provided should be a data frame or a numerical matrix for input vectors and matrices, and numerical values for the arguments containing a single element, like relig. The package imports three additional packages: stats (R Core Team 2018), optimbase (Bihorel and Baudin 2014) and psych (Revelle 2018), for internal calculations (e.g. using the fa function from psych package for performing the FA calibration).

Since the function requires the factor score estimates as input, these estimates must be obtained from the raw data (i.e. the raw item scores) before unival is used. We recommend the non-commercial FACTOR program (Lorenzo-Seva and Ferrando 2013) to obtain EAP estimates under the linear and the graded FA model, or the mirt R package (Chalmers 2012) to obtain ML and EAP estimates for both models. FACTOR also provides PSDs for the EAP scores. Finally, both programs provide marginal reliability estimates for the chosen factor scores.

5 Simulation studies

The sensitivity of the procedures proposed in unival, for both differential and incremental validity, depends on two main factors. The first is the relative strength of the relations between (a) the general factor scores and the external variables, and (b) the primary factor scores and the external variable. The second is the extent of the agreement between the relations between the unique parts of the primary factor and the external variables and the relations between the primary factor scores and the general factor. In summary, differential and incremental validity are expected to be clearly detected when (a) the primary factor scores are more strongly related to the external variable than to the general scores, and (b) the relation between the unique parts of the primary scores and the external variables is the opposite of the relation between the corresponding factors and the general factor. The opposite condition: (a) a general, dominant factor relates more strongly to the external variable than the primary factors do; and (b) a similar profile of relations in which the primary factors relate to the external variable in the same way as they do with the general factor, is very difficult to distinguish from the null hypothesis on which the procedures are based.

Ferrando and Lorenzo-Seva (2019) undertook a general simulation study in which the determinants above were manipulated as independent variables together with sample and model size. The study was based on the linear FA model and Bartlett’s ML score estimates. In this article we replicated the study above but we discretized the continuous item responses in five response categories (i.e. a typical Likert score) and fitted the data using the non-linear FA model, thus treating the item scores are ordered-categorical variables. In addition, the factor score estimates were Bayes EAP scores. The present study, then, considers the second potential FA model that can be used in unival, and assesses the behavior of some of the new developments proposed in the article (the use of Bayes scores instead of ML scores). Because the design and conditions of the study were the same as those in Ferrando and Lorenzo-Seva (2019) the results are only summarized here. Details and tables of results can be obtained from the authors. The results generally agreed quite well with those obtained in the original study except for the (unavoidable) loss of power due to categorization. More in detail, in the study under the null model, neither spurious differential nor incremental validity was detected in any of the conditions.

In the studies in which the alternative model was correct, the following results were obtained. Differential validity was correctly detected except in the least favorable cells: dominant general-factor relations and profile agreement. As for incremental validity, the loss of power was more evident, and the procedure was less sensitive than in the continuous case: when the profiles of relations agreed (i.e. when the primary factors related to the external variable in the same way as they related to the general factor), unival failed to detect the increments in predictive power. This result, which, to a lesser extent, had already been obtained in the original study, suggests the unique relations have already been taken into account by the general factor score estimates. So, the multiple-regression linear composite, with weights very similar to those of the general factor score composite, does not substantially add to the prediction of the external variable. Overall, then, the results of the study suggest that in low-sensitivity conditions the unival outcome leads to the unidimensional model being chosen even when unique relations with the criterion do in fact exist. This choice, however, is probably not a practical limitation, as in these conditions the unidimensional model is more parsimonious and can explain the validity relations well. Finally, as for the differences with the previous study, the results suggest the unival procedures also work well with the non-linear FA model and Bayes scores. However, as expected, the categorization of the responses leads to a loss of information which, in turn, results in a loss of sensitivity and power. The most reasonable way to compensate for this loss would probably be to use a larger number of items.

6 Illustration with real data

The unival package contains an example dataset – SAS3f – which is a matrix containing a criterion (marks on a final statistics exam), the primary factor score estimates and the general factor score estimates in a sample of 238 respondents. Both the primary and general scores were EAP estimates obtained with the FACTOR (Lorenzo-Seva and Ferrando 2013) program.

The instrument under scrutiny is the Statistical Anxiety Scale (SAS, Vigil-Colet et al. 2008) a 24-item instrument which was initially designed to assess three related dimensions of anxiety: Examination anxiety (EA), asking for help anxiety (AHA) and interpretation anxiety (IA). Previous studies have obtained a clear solution in three highly-related factors but have also found an essentially unidimensional solution is tenable. So, the problem is to decide whether it is more appropriate to use only single-factor scores measuring an overall dimension of statistical anxiety or it is preferable (and more informative) to use the factor score estimates in each of the three dimensions.

The only remaining argument for running unival with minimal input requests is the inter-factor correlation matrix between the primary factors. The example should be specified as follows:

> PHI = cbind(c(1,0.408,0.504),c(0.408,1,0.436),c(0.504,0.436,1))
> y = SAS3f[,1]
> FP = as.matrix(SAS3f[,2:4])
> fg = SAS3f[,5]
> unival(y = y, FP = FP, fg = fg, PHI = PHI, type = 'EAP')

The output from the above command is:

Unival: Assessing essential unidimensionality using external validity information

Differential validity assessment:

0.6012 (0.4615 - 0.7311) 
0.2362 (0.0280 - 0.4172) 
0.3635 (0.2390 - 0.5035) 

Maximum difference

0.2377 (0.0891 - 0.3587) *

Incremental validity assessment:

0.3164 (0.2328 - 0.3944) 
0.4107 (0.3362 - 0.4720)

Incremental value estimate 

0.0943 (0.0203 - 0.1492) **

* Some factors are more strongly or weakly related to the criterion that can be
 predicted from their relations to the general factor
** There is a significant increase in accuracy between the prediction based on the
 primary factor score estimates and that based on the general factor score  estimates.

Overall, the results seem to be clear. In differential validity terms, the confidence intervals for the first and second factors do not overlap, and the zero value falls outside the maximum-difference confidence interval. The interpretation is the primary factors relate to the criterion in ways that cannot be predicted from their relations with the general factor. More specifically, the first factor (AHA) seems to be more strongly related, and the second factor (IA) more weakly related to the criterion than could be predicted by their relations with the general factor.

Incremental-validity results are also clear: the prediction of the criterion based on the primary factor estimates clearly outperforms the prediction that can be made from the general factor score estimates when the regressions are corrected for measurement error. Note in particular the zero value falls well outside the confidence interval of the incremental validity estimate. To sum up, it is clear both information and predictive power will be lost in this example if the single or general factor score estimates are used as a summary of the estimates based on the three anxiety factors. So, in terms of validity, the FA solution in three correlated factors seems to be preferable.

7 Concluding remarks

In the FA literature, several authors (Goldberg 1972; Mershon and Gorsuch 1988; e.g. Carmines and Zeller 1991; Floyd and Widaman 1995) have pointed out the dimensionality of a set of item scores cannot be decided solely in internal terms. Rather, the ultimate criterion for judging what the most appropriate solution is should be how the scores derived from this solution relate to relevant external variables. In spite of this, however, external information is rarely used in FA-based assessments. One explanation for this state of affairs is, indeed, the difficulty of collecting additional relevant external measures. Apart from this, however, clear and rigorous procedures on how to carry out this assessment have only been proposed recently and, so far, have not been implemented in non-commercial software. For this reason, we believe unival is a useful additional tool for researchers who use FA in psychometric applications.

unival has been designed to work with scores derived from an FA solution rather than from raw item scores, and this has both shortcomings and advantages. Thus, at the minimal-input level, potential users of the program have to be able to carry out factor analyses with other programs, and, particularly, to obtain factor score estimates. Furthermore, they need to know what types of score have been computed by the program. More advanced unival usages require users to know how to obtain marginal reliability estimates for the factor scores or how to perform second-order factor analysis. To sum up, the program is designed for practitioners with some level of proficiency in FA. In principle, this is a potential shortcoming but does not restrict the usefulness of the program. As described above, all the input required by unival can be obtained from non-commercial FA packages, some of which are also quite user friendly.

The choice of the factor scores as input, on the other hand, makes the program extremely flexible and versatile. unival can work with scores derived from standard linear FA solutions or from non-linear solutions (which include the multidimensional versions of the graded-response and the two-parameter IRT models). Furthermore, users can choose to provide the minimal input options, or can tailor the input by choosing the type of marginal reliability estimate to be used in the error corrections or the general factor score estimates on which the analyses are based (second-order factor scores or scores derived from directly fitting the unidimensional model). No matter how complex the model or input choices are, however, the output provided by unival is extremely simple and clear to interpret, as the illustrative example shows.

8 Acknowledgments

This project has been made possible by the support of the Ministerio de Economía, Industria y Competitividad, the Agencia Estatal de Investigación (AEI) and the European Regional Development Fund (ERDF) (PSI2017-82307-P).

CRAN packages used

unival, stats, optimbase, psych, mirt

CRAN Task Views implied by cited packages

MissingData, Psychometrics


This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

M. S. Bartlett. The statistical conception of mental factors. British Journal of Psychology, 28: 97–104, 1937. URL https://doi.org/10.1111/j.2044-8295.1937.tb00863.x.
A. Beauducel, C. Harms and N. Hilger. Reliability estimates for three factor score estimators. International Journal of Statistics and Probability, 5(6): 94–107, 2016. URL https://doi.org/10.5539/ijsp.v5n6p943.
S. Bihorel and M. Baudin. Optimbase: R port of the scilab optimbase module. 2014. URL https://CRAN.R-project.org/package=optimbase. R package version 1.0-9.
R. D. Bock and R. J. Mislevy. Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4): 431–444, 1982. URL https://doi.org/10.1177/014662168200600405.
E. G. Carmines and R. A. Zeller. Reliability and validity assessment. SAGE, 1991.
R. P. Chalmers. mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6): 1–29, 2012. DOI 10.18637/jss.v048.i06.
H. Chang and W. Stout. The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58(1): 37–52, 1993. URL https://doi.org/10.1007/BF02294469.
P. J. Curran, V. T. Cole, D. J. Bauer, W. A. Rothenberg and A. M. Hussong. Recovering predictor–criterion relations using covariate-informed factor score estimates. Structural Equation Modeling: A Multidisciplinary Journal, 25(6): 860–875, 2018. URL https://doi.org/10.1080/10705511.2018.1473773.
P. J. Ferrando and U. Lorenzo-Seva. An external validity approach for assessing essential unidimensionality in correlated-factor models. Educational and Psychological Measurement, 2019. URL https://doi.org/10.1177/0013164418824755.
P. J. Ferrando and U. Lorenzo-Seva. Assessing the quality and appropriateness of factor solutions and factor score estimates in exploratory item factor analysis. Educational and Psychological Measurement, 78(5): 762–780, 2018. URL https://doi.org/10.1177/0013164417719308.
P. J. Ferrando and U. Lorenzo-Seva. On the added value of multiple factor score estimates in essentially unidimensional models. Educational and Psychological Measurement, in press. URL https://doi.org/10.1177/0013164418773851.
P. J. Ferrando and U. Lorenzo-Seva. Program FACTOR at 10: Origins, development and future directions. Psicothema, 29: 236–241, 2017. URL https://doi.org/10.7334/psicothema2016.304.
P. J. Ferrando and D. Navarro-Gonzalez. Assessing the quality and usefulness of factor-analytic applications to personality measures: A study with the statistical anxiety scale. Personality and Individual Differences, 123(1): 81–86, 2018. URL https://doi.org/10.1016/j.paid.2017.11.014.
F. J. Floyd and K. F. Widaman. Factor analysis in the development and refinement of clinical assessment instruments. Psychological assessment, 7(3): 286–299, 1995. URL https://doi.org/10.1037/1040-3590.7.3.286.
A. Furnham. The development of single trait personality theories. Personality and Individual Differences, 11(9): 923–929, 1990. URL https://doi.org/10.1016/0191-8869(90)90273-T.
L. R. Goldberg. Parameters of personality inventory construction and utilization: A comparison of prediction strategies and tactics. Multivariate Behavioral Research Monographs, 72(2): 59, 1972.
S. J. Haberman. When can subscores have value? Journal of Educational and Behavioral Statistics, 33(2): 204–229, 2008. URL https://doi.org/10.3102/1076998607302636.
U. Lorenzo-Seva and P. J. Ferrando. FACTOR 9.2: A comprehensive program for fitting exploratory and semiconfirmatory factor analysis and IRT models. Applied Psychological Measurement, 37(6): 497–498, 2013. URL https://doi.org/10.1177/0146621613487794.
B. Mershon and R. L. Gorsuch. Number of factors in the personality sphere: Does increase in factors increase predictability of real-life criteria? Journal of Personality and Social Psychology, 55(4): 675–680, 1988. URL https://doi.org/10.1037/0022-3514.55.4.675.
R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2018. URL https://www.R-project.org/.
T. Raykov and G. A. Marcoulides. On studying common factor dominance and approximate unidimensionality in multicomponent measuring instruments with discrete items. Educational and Psychological Measurement, 78(3): 504–516, 2018. URL https://doi.org/10.1177/0013164416678650.
S. P. Reise, W. E. Bonifay and M. G. Haviland. Scoring and modeling psychological measures in the presence of multidimensionality. Journal of personality assessment, 95(2): 129–140, 2013. URL https://doi.org/10.1080/00223891.2012.725437.
S. P. Reise, K. F. Cook and T. M. Moore. Evaluating the impact of multidimensionality on unidimensional item response theory model parameters. In Handbook of item response theory modeling, pages. 13–40 2015. Routledge.
W. Revelle. Psych: Procedures for psychological, psychometric, and personality research. Evanston, Illinois: Northwestern University, 2018. URL https://CRAN.R-project.org/package=psych. R package version 1.8.10.
A. Rodriguez, S. P. Reise and M. G. Haviland. Applying bifactor statistical indices in the evaluation of psychological measures. Journal of personality assessment, 98(3): 223–237, 2016a. URL https://doi.org/10.1080/00223891.2015.1089249.
A. Rodriguez, S. P. Reise and M. G. Haviland. Evaluating bifactor models: Calculating and interpreting statistical indices. Psychological Methods, 21(3): 137–150, 2016b. URL https://doi.org/10.1037/met0000045.
A. Vigil-Colet, U. Lorenzo-Seva and L. Condon. Development and validation of the statistical anxiety scale. Psicothema, 20(1): 174–180, 2008. URL https://doi.org/10.1037/t62688-000.
H. Wainer and R. J. Mislevy. Item response theory, item calibration and proficiency estimations. In Computerized adaptive testing: A primer, Ed H. Wainer pages. 61–101 2000. LEA.



Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".


For attribution, please cite this work as

Ferrando, et al., "unival: An FA-based R Package For Assessing Essential Unidimensionality Using External Validity Information", The R Journal, 2019

BibTeX citation

  author = {Ferrando, Pere J. and Lorenzo-Seva, Urbano and Navarro-Gonzalez, David},
  title = {unival: An FA-based R Package For Assessing Essential Unidimensionality Using External Validity Information},
  journal = {The R Journal},
  year = {2019},
  note = {https://rjournal.github.io/},
  volume = {11},
  issue = {1},
  issn = {2073-4859},
  pages = {427-436}