The *TensorTest2D* package provides the means to fit generalized linear models on second-order tensor type data. Functions within this package can be used for parameter estimation (e.g., estimating regression coefficients and their standard deviations) and hypothesis testing. We use two examples to illustrate the utility of our package in analyzing data from different disciplines. In the first example, a tensor regression model is used to study the effect of multi-omics predictors on a continuous outcome variable which is associated with drug sensitivity. In the second example, we draw a subset of the MNIST handwritten images and fit to them a logistic tensor regression model. A significance test characterizes the image pattern that tells the difference between two handwritten digits. We also provide a function to visualize the areas as effective classifiers based on a tensor regression model. The visualization tool can also be used together with other variable selection techniques, such as the LASSO, to inform the selection results.

Tensors are multidimensional arrays and are increasingly encountered in
practices due to the burgeoning development of high throughput
technology, e.g., brain image data (Zhou et al. 2013) and multi-omics
data (Chang et al. 2021). Within the framework of regression
analysis, tensor-structured data can play a role in the response
variable, the explanatory variable, or in both. Some available R
packages, such as *TRES* and
*MultiwayRegression*,
consider tensor regression with general tensor structure. The package
*TRES* (Wang et al. 2020) provides
tools to perform regression analysis with a tensor envelope structure in
the tensor regression model, and the output of which includes \(p\)-values
for the regression coefficients.
*TRES* aims at variable
selection via significance tests. The package
*MultiwayRegression*
(Lock 2018, 2019) performs \(L_2\) penalized tensor regression
which is useful for predictive modeling but not for the identification
of important variables. Both the
*TRES* and the
*MultiwayRegression*
consider regression models with continuous outcome variables only.
Compared to existing R packages, the proposed package
*TensorTest2D*
(Chen et al. 2021) considers a generalized linear model (GLM) with
matrix-structured predictors and a scalar outcome, and it can be used
for outcome prediction or testing.

There are four main functions in
*TensorTest2D*. The
function `tensorReg2D()`

is designed to provide estimates of regression
coefficients and their standard deviations, as well as the \(p\)-value for
testing whether a regression coefficient is significantly different from
zero. The function `summary()`

organizes the above information into an
output table. The function `plot()`

can be used to visualize the
locations of the predictors significantly affecting the response
variable in the predictor matrix. Finally, the function `predict()`

can
be used to predict the response values using the conditional mean given
a specific predictor matrix.

The rest of this paper is arranged as follows. First, we describe a
regression model with tensor predictors under GLM. Next, we illustrate
the main functions in package
*TensorTest2D* using
two examples, and illustrate the relevancy of using low-rank tensor
regression. The first example focuses on association testing, where we
apply tensor regression to identify genomic variables that affect the
drug sensitivity for lung cancer treatment. The second example is for
classification, where we apply logistic tensor regression to model the
relationship between a binary response variable and images of
handwritten digits in the MNIST database. We also use significance
testing of the image predictors to identify locations of an image that
play important roles in distinguishing between two different handwritten
digits. The datasets being used in these two examples are included in
the package
*TensorTest2D* —
users can use `data(omics)`

to load the dataset of the first
(association) example and `data(mnist_mp2c2)`

to load the dataset of the
second (classification) example.

In this work, we consider tensor regression under the GLM framework and
extend the inference procedure of tensor parameters in
Chang et al. (2021) from continuous responses to binary and
count responses. Suppose that there exists a dataset consisting of \(n\)
independent triplets, \(\{y_i, \mathbf{w}_i, \mathbf{X}_i\}\),
\(i=1,\dots,n\), where \(y_i\) is a scalar outcome, \(\mathbf{w}_i\) is a
\(d\)-dimensional covariate vector and \(\mathbf{X}_i\) is a \(P\times Q\)
matrix predictor. Without loss of generality, we assume hereafter
\(P\leq Q\). For two matrices, say \(\mathbf{X}_1\) and \(\mathbf{X}_2\), of
the same size, define the dot product of these two matrices as
\(\mathbf{X}_1 \circ \mathbf{X}_2 = \sum_{p=1}^P\sum_{q=1}^Q X_{1,pq}X_{2, pq}\)
where \(X_{j,pq}\) is the \((p,q)\)th entry of the matrix \(\mathbf{X}_j\),
\(j=1,2\). The GLM with an order-2 tensor predictor can therefore be
defined as
\[\label{eq:glm}
g\left(E(y_i)\right) = \mathbf{w}_i^{\top}\boldsymbol{\beta} + \mathbf{X}_i \circ \mathbf{B}, \tag{1}\]
where \(g(\cdot)\) is the link function,
\(\boldsymbol{\beta} \in \mathbb{R}^d\) and
\(\mathbf{B}\in\mathbb{R}^{P\times Q}\). If there are \(P\times Q\)
unconstrained parameters in \(\mathbf{B}\), the above representation is
equivalent to a glm with \(d + PQ\) covariates, including the intercept.
In *TensorTest2D*, we
implement linear regression with identity link, Poisson regression with
log link, and logistic regression with logit link.

When \(PQ\) is relatively small, one can vectorize the matrix \(\mathbf X_i\) so that (1) can be expressed as a conventional GLM with \(d + PQ\) covariates. When \(PQ\) is large, one can explore the matrix structure of predictors and consider a low-rank tensor GLM so as to reduce the number of parameters of interest while retaining the variable-specific resolution. The main idea of tensor GLM is to model \(\mathbf{B}\) by a low-rank-constrained \(\mathbf{B}\) so that \(\mathbf{B}\) can be fully specified using fewer unconstrained parameters. See Hung and Wang (2012) and Chang et al. (2021) for more detail. Take the MNIST handwritten image classification as an example, where handwritten images are recorded in \(10 \times 10\) matrices. When there is no constraint on \(\mathbf{B}\), the number of parameters of interest is \(PQ=100\). On the other hand, if we restrict the rank of \(\mathbf B\) to be \(r\), the number of unconstrained parameters is \((P + Q)\times r - r^2\) that takes a value of 19, 36, 51, and 64 when \(r = 1\), 2, 3, and 4, respectively. The reduction in the number of parameters is significant when \(r\) is small.

Given a pre-specified rank \(r = R\), one can model \(\mathbf{B}\) with a low-rank constraint by setting \(\mathbf{B} = \mathbf{B}_1\mathbf{B}_2^{\top}\), where \(\mathbf{B}_1\in\mathbb{R}^{P\times R}\), \(\mathbf{B}_2\in\mathbb{R}^{Q\times R}\) and \(R\leq P\) (Zhou et al. 2013; Chang et al. 2021). The low-rank tensor regression model is therefore \[\label{eq:lrtr} g\left(E(y_i)\right) = \mathbf{w}_i^{\top}\boldsymbol{\beta} + \mathbf{X}_i \circ \left(\mathbf{B}_1\mathbf{B}_2^{\top}\right). \tag{2}\] Additional constraints on \(\mathbf{B}_1\mathbf{B}_2^{\top}\) are needed to ensure parameter identifiability, see Zhou et al. (2013) for example. We adopt in this package the constraints considered by Chang et al. (2021), that leaves the total number of unconstrained parameters to be \(d + (P + Q)R - R^2\). Denote \(\boldsymbol \eta\) as the vector which collects all unconstrained parameters in (2). According to the theory of GLM, the score function and the Fisher information matrix with respect to the model are \[\sum_{i=1}^n \frac{\partial \mu_i}{\partial \boldsymbol \eta}(y_i - \mu_i) \quad and \quad \sum_{i=1}^n \frac{\partial \mu_i}{\partial \boldsymbol \eta}V_i \left(\frac{\partial \mu_i}{\partial \boldsymbol \eta}\right)^{\top},\] respectively, where \(\mu_i = E(y_i)\) and \(V_i = Var(y_i)\). However, we are interested in estimating \(\mathbf{B}=\mathbf{B}_1\mathbf{B}_2^{\top}\) and testing whether entries of \(\mathbf B\) are nonzero. The derivation of the sampling distribution of \(\hat{\mathbf{B}}\) is omitted here, for the process is similar to that in Chang et al. (2021) and it does not need to be reproduced here again to misdirect readers’ attention. As the true rank of \(\mathbf{B}\) is unknown, following Chang et al. (2021), we use the Akaike information criterion (AIC) to determine the optimal rank.

Before giving a brief description of how Chang et al. (2021)
place constraints on tensor regression parameterization, we wish to
emphasize that the process of estimating for
\(\partial \mu_i / \partial \boldsymbol{\eta}\) is in typical not exactly
simple. It is known that the matrix factorization (decomposition)
\(\mathbf{B}=\mathbf{B}_1\mathbf{B}_2^{\top}\) is not unique because, for
every invertible matrix \(\mathbf{O} \in \mathbb{R}^{R\times R}\),
\(\mathbf{B} = \left(\mathbf{B}_1\mathbf{O}^{-1}\right)\left(\mathbf{O}\mathbf{B_2}^{\top}\right)\).
Write
\[\mathbf{B}_1=\left[\begin{array}{c}
\mathbf{B}_{11} \\ \mathbf{B}_{21}
\end{array}\right],\]
where \(\mathbf{B}_{11} \in \mathbb{R}^{R\times R}\) and
\(\mathbf{B}_{21} \in \mathbb{R}^{(P-R)\times R}\), and assume
\(\mathbf{B}_{11}\) is invertible. One way to ensure the uniqueness of the
matrix factorization is to force \(\mathbf{O}=\mathbf{B}_{11}\), and thus
\[\mathbf{B}_1\mathbf{B}_2^\top=\left[\begin{array}{c}
\mathbf{B}_{11} \\ \mathbf{B}_{21}
\end{array}\right] \mathbf{O}^{-1}\mathbf{O}\mathbf{B}_2^{\top}
=\left[\begin{array}{c}
\mathbf{B}_{11}\mathbf{O}^{-1} \\ \mathbf{B}_{21}\mathbf{O}^{-1}
\end{array}\right] \tilde{\mathbf{B}}_2^{\top}
= \left[\begin{array}{c}
\mathbf{I}_R \\ \tilde{\mathbf{B}}_{21}
\end{array}\right] \tilde{\mathbf{B}}_2^{\top},\]
where \(\tilde{\mathbf{B}}_{21}={\mathbf{B}}_{21}\mathbf{O}^{-1}\) and
\(\tilde{\mathbf{B}}_{2}={\mathbf{B}}_{2}\mathbf{O}^{\top}\).
Consequently, the unknown parameter matrices are
\(\tilde{\mathbf{B}}_{21} \in \mathbb{R}^{(P-R)\times R}\) and
\(\tilde{\mathbf{B}}_2 \in \mathbb{R}^{Q\times R}\) with a total of
\((P-R)\times R + Q\times R\) unknown parameters. We believe that an exact
formula for \(\partial \mu_i / \partial \tilde{\boldsymbol{\eta}}\) can
not be found prior to the work by Chang et al. (2021) in the
case when
\(\tilde{\boldsymbol{\eta}} = (vec(\tilde{\mathbf{B}}_{12})^{\top}, vec(\tilde{\mathbf{B}}_1)^{\top})^{\top}\),
and the same formula is used in
*TensorTest2D*.

In this section, we present examples of real data analysis by using the
package
*TensorTest2D*. The
main function, `tensorReg2D`

, in our
*TensorTest2D*
package is used for following data analysis. The inputs are the response
vector, `y`

, covariates matrix `X`

, collecting tensor data, and vector
`W`

, collecting adjustment information such as age and gender. The key
configurable parameters are the rank of \(\mathbf{B}\), `n_R`

, and the
type of response variable, `family`

. The `tensorReg2D`

handles three
types of generalized regression problems. For continuous response, set
`family = "gaussian"`

and it fits the linear regression model based on
identity link function. If the responses are binary, by setting
`family = "binomial"`

, it runs logistic regression modeling through the
logit link. When the response variable is non-negative integer, the log
link corresponding to poisson regression is used by setting
`family = "poisson"`

.

The function `tensorReg2D()`

returns a list object which includes the
following variables: `b_EST`

represents the coefficient vector
\(\hat{\beta}\); `b_SD`

represents the the corresponding standard
deviation vector and `b_PV`

the \(p\)-value vector; `B_EST`

represents the
coefficient matrix \(\hat{\mathbf{B}}\) for the image effect, `B_SD`

the
standard deviations of the coefficient estimates and `B_PV`

the matrices
of \(p\)-values; the output `IC`

contains the AIC and BIC values for the
purpose of model selection.

See `?tensorReg2D`

for more details of the configuration and the output
values.

The package
*TensorTest2D*
includes a data set, `omics`

, which consists of a continuous response
variable and 30 omics predictors that can be organized into a
\(3\times 10\) matrix. The response variable is the drug sensitivity of
vandetanib measured by log-transformed activity area. Vandetanib is a
drug targeting gene EGFR for lung cancer treatment. The 30 omics
predictors are the genomic information of 10 genes measured from 3
platforms: copy number variation (CNV), methylation and mRNA expression.
Among the 10 genes, 7 of them (EGFR, EREG, HRAS, KRAS, PTPN11, STAT3,
and TGFA) are involved in the protein-protein interaction network of
EGFR (https://string-db.org) and the rest (ACTB, GAPDH, and PPIA) are
arbitrarily chosen housekeeping genes with permuted entries and serve as
negative controls. The included data, omics.RData, is a subset of the
data set provided by cancer cell line encyclopedia (CCLE) project
(Barretina et al. (2012); https://sites.broadinstitute.org/ccle/). Detailed
pre-processing procedure for `omics`

is available in
(Chang et al. 2021). The data set `omics`

can be loaded via
the following syntax:

```
library(TensorTest2D)
data(omics)
# The size of the data P, Q, n
print(dim(omics$omics))
```

`#> [1] 3 10 68`

In the omics example, \(\mathbf w_i\) only consists of intercepts and
\(\mathbf X_i\) being a \(P\times Q\) matrix with \(P=3\) and \(Q=10\). As
described, this matrix consists of expression values of 10 genes
evaluated under three different platforms. For the reason of
\(R\leq\min{\{P, Q\}}\) (see Chang et al. (2021)), there are
three possible tensor models, namely, the rank-1, the rank-2 and the
rank-3 model, to describe the relationship between the outcome and the
matrix predictors. The models with the smallest AIC value will be
selected as the optimal one, and here the rank-1 model has the smallest
AIC value. The rank-1 model identifies two important variables: EGFR
under methylation platform (coefficient = -0.2416; \(p\)-value = 0.0022)
and EGFR under CNV platform (coefficient = 0.2508; \(p\)-value = 0.0061).
Those lines of code below this paragraph were used as an example to
perform and print out the results of model fitting. The utility function
`summary(omicsMdl)`

shows the model structure, summary statistics about
the residuals, and the table of significance tests for the coefficients.
On top of the result table, the model structure `y ~ I + X`

of this case
is revealed where `I`

is the intercept term and `X`

is the matrix
covariate. The names of the coefficients appear in the first column of
the table. In addition to the `(Intercept)`

and the terms of
\(\mathbf{w}\), `Xi.j`

is the coefficient of the \(i\)th row and \(j\)th
column of \(\mathbf{X}\). If the row and the column names of \(\mathbf{X}\)
are specified, then the names of coefficients in \(\mathbf{X}\) are
`ROWi:COLUMNj`

. Those values in the summary table can also be returned
separately. Here, we print the attributes of the `tsglm`

object
separately for the estimated coefficients and their standard deviations,
along with the \(p\)-values by the Wald test (see Wald (1943)).

```
set.seed(100) # Set seed for reproducibility
# Try from rank-1 to rank-3 models
<- numeric(3)
omicsAIC for (k in 1:3) {
# Temporary storage for the rank-k model for withdrawing its AIC value
<- tensorReg2D(y = omics$Y, X = omics$omics,
omicsTmp W = matrix(1, length(omics$Y), 1),
n_R = k, family = "gaussian",
opt = 1, max_ite = 1000, tol = 10^(-7) )
<- omicsTmp$IC[1] # AIC
omicsAIC[k]
}sprintf('Rank-%d model is the best with smallest AIC = %4.4f', which.min(omicsAIC), min(omicsAIC))
```

`#> [1] "Rank-1 model is the best with smallest AIC = -62.3135"`

```
# Train the tensor regression model of rank 1
<- tensorReg2D(y = omics$Y, X = omics$omics,
omicsMdl W = matrix(1, length(omics$Y), 1),
n_R = which.min(omicsAIC), family = "gaussian",
opt = 1, max_ite = 1000, tol = 10^(-7) )
# Return the results of significance tests for all coefficients
summary(omicsMdl)
```

```
#> Call:
#> formula = y ~ I + X
#>
#> Residuals:
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -1.31835 -0.29160 0.03354 0.00000 0.40356 1.06511
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.06078 0.07044 -0.86285 0.3919672
#> cnv:EGFR 0.25078 0.08796 2.85098 0.0061255 ***
#> meth:EGFR -0.24162 0.07511 -3.21673 0.0021740 ***
#> rna.rpkm:EGFR 0.08933 0.07857 1.13696 0.2604852
#> cnv:EREG -0.00751 0.05094 -0.14743 0.8833301
#> meth:EREG 0.00724 0.0494 0.14648 0.8840812
#> rna.rpkm:EREG -0.00268 0.01849 -0.14468 0.8854906
#> cnv:HRAS -0.04866 0.05983 -0.81334 0.4195282
#> meth:HRAS 0.04689 0.06056 0.77425 0.4421012
#> rna.rpkm:HRAS -0.01733 0.02956 -0.5865 0.5599385
#> cnv:KRAS -0.0267 0.05067 -0.52699 0.6003203
#> meth:KRAS 0.02573 0.04913 0.52367 0.6026098
#> rna.rpkm:KRAS -0.00951 0.01754 -0.54244 0.5897064
#> cnv:PTPN11 0.09193 0.06183 1.48669 0.1428065
#> meth:PTPN11 -0.08857 0.05093 -1.73886 0.0876539 *
#> rna.rpkm:PTPN11 0.03275 0.03289 0.99558 0.3238137
#> cnv:STAT3 -0.05747 0.05517 -1.0417 0.3021072
#> meth:STAT3 0.05537 0.04953 1.11792 0.2684585
#> rna.rpkm:STAT3 -0.02047 0.0257 -0.79672 0.4290423
#> cnv:TGFA 0.05049 0.06361 0.79367 0.4307973
#> meth:TGFA -0.04864 0.05903 -0.82399 0.4135018
#> rna.rpkm:TGFA 0.01798 0.02625 0.68526 0.4960557
#> cnv:ACTB -0.03107 0.05212 -0.5961 0.5535539
#> meth:ACTB 0.02993 0.05128 0.58371 0.5618033
#> rna.rpkm:ACTB -0.01107 0.02339 -0.47307 0.6380374
#> cnv:GAPDH -0.04123 0.04963 -0.83059 0.4097953
#> meth:GAPDH 0.03972 0.04737 0.83856 0.4053450
#> rna.rpkm:GAPDH -0.01469 0.02221 -0.66128 0.5111932
#> cnv:PPIA -0.04299 0.06373 -0.67454 0.5027957
#> meth:PPIA 0.04142 0.05989 0.69153 0.4921444
#> rna.rpkm:PPIA -0.01531 0.02711 -0.56477 0.5745259
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
# Estimated coefficients
print(round(omicsMdl$B_EST, 3))
```

```
#> EGFR EREG HRAS KRAS PTPN11 STAT3 TGFA ACTB GAPDH PPIA
#> cnv 0.251 -0.008 -0.049 -0.027 0.092 -0.057 0.050 -0.031 -0.041 -0.043
#> meth -0.242 0.007 0.047 0.026 -0.089 0.055 -0.049 0.030 0.040 0.041
#> rna.rpkm 0.089 -0.003 -0.017 -0.010 0.033 -0.020 0.018 -0.011 -0.015 -0.015
```

```
# The standard deviation of the coefficients
print(round(omicsMdl$B_SD, 3))
```

```
#> EGFR EREG HRAS KRAS PTPN11 STAT3 TGFA ACTB GAPDH PPIA
#> cnv 0.088 0.051 0.060 0.051 0.062 0.055 0.064 0.052 0.050 0.064
#> meth 0.075 0.049 0.061 0.049 0.051 0.050 0.059 0.051 0.047 0.060
#> rna.rpkm 0.079 0.018 0.030 0.018 0.033 0.026 0.026 0.023 0.022 0.027
```

```
# The p-values of the coefficients by the Wald test
print(round(omicsMdl$B_PV, 3))
```

```
#> EGFR EREG HRAS KRAS PTPN11 STAT3 TGFA ACTB GAPDH PPIA
#> cnv 0.006 0.883 0.420 0.600 0.143 0.302 0.431 0.554 0.410 0.503
#> meth 0.002 0.884 0.442 0.603 0.088 0.268 0.414 0.562 0.405 0.492
#> rna.rpkm 0.260 0.885 0.560 0.590 0.324 0.429 0.496 0.638 0.511 0.575
```

In our package
*TensorTest2D*, the
function `plot()`

can be used to visualize the importance of the matrix
predictor. The output is an \(P\times Q\) heat map that the plotted values
on it are controlled by the option `type`

. If the unit of data varies
across the rows or columns in \(X\), it is suggested to choose the
t-statistics of the coefficients (`type = "tval"`

) instead of their
values `type = "coef"`

. In addition, the function `plot()`

also marks
the pixels with \(p\)-values smaller than a pre-determined significance
level. Users can select the \(p\)-value adjusting method (see
`help("p.adjust")`

) by the option `method`

and specify the significance
level through the option `alpha`

. We plot in Figure 1
the t-statistics of the coefficients in \(\hat{\mathbf{B}}\), where red
and blue colors represent the pixels of positive and negative values,
respectively. For those coefficients with \(p\)-values less than `alpha`

,
their corresponding pixels are marked with the symbol, \(\boxtimes\), in
this example, cnv:EGFR and meth:EGFR.

```
plot(x = omicsMdl, method = "none", alpha = 0.05, type = "tval",
showlabels = TRUE, plot.legend = TRUE)
```

MNIST is a well-known benchmark database for image recognition in
machine learning. It consists of over 60,000 training images and 10,000
testing images. For R users, one can obtain the image data from the
*dslabs* package
(Irizarry and Gill 2019). For the purpose of demonstration, we reduce the
size of MNIST image data by a max-pooling step with \(2 \times 2\) block,
as shown in the images on the left and in the middle of Figure
2. The original \(28 \times 28\) images are thus
pixelated into \(14 \times 14\) max-pooled images. Because pixels at the
edges and the corners of the max-pooled images take no information
across almost all images, we removed those pixels and ended up with
\(P\times Q = 10 \times 10\) sub-images as illustrated in the image on the
right of Figure 2. The mean image of the
pre-processed training set for each label in the MNIST database is shown
in Figure 3. These pre-processed images of
\(10 \times 10\) pixels are included in the package
*TensorTest2D* and
can be imported using the following commands:

```
library(TensorTest2D)
data(mnist_mp2c2)
<- mnist_mp2c2$train
mnist_train <- mnist_mp2c2$test mnist_test
```

The aim of this data analysis is to recognize the digit for a given
handwritten image using logistic regression. Here, we choose images of
‘2’ and ‘5’ for demonstration. Let \(Y_i = 1\) if the \(i\)th image
represents the digit ‘5’, and \(Y_i = 0\) if the \(i\)th image represents a
‘2’. The predictor matrix \(X_i\) here is a \(10 \times 10\) matrix with its
entries the pixel values of the handwritten image in grayscale. In the
following, we first describe the data processing steps and provide the
code being used to obtain our training data in the analysis. Our
training data, `train_X`

, is a
\(P \times Q \times n = 10 \times 10 \times 2000\) array, which contains
subsets of 1,000 images of the digit ‘2’ and 1,000 images of the digit
‘5’ randomly chosen from the MNIST training set `mnist_train`

. In this
MNIST example, some pixels in the corners and on the edges take on the
value zero across all handwritten images, which yields singularity when
the alternating maximum likelihood algorithm is applied to the training
data. To solve this problem, we can simply drop only those zero-valued
pixels. However, doing so breaks the matrix form and hence low-rank
model is no longer valid. Alternatively, we add independent standard
normal noise to the images in our training data set that results in no
significant harm to the prediction power, because the signal-noise ratio
is high and the training set sample size is sufficiently large.
Hereafter, we call images with random error the contaminated images.

```
library(abind)
# Draw image data of labels 2 and 5
<- mnist_train$image[,,which(mnist_train$label == 2)]
x0_all <- mnist_train$image[,,which(mnist_train$label == 5)]
x1_all # Random sampling from MNIST training set for each label
<- 1000
nSampleEach <- dim(x0_all)[3]; n1 <- dim(x1_all)[3]
n0 set.seed(2021)
<- sample(1:n0, nSampleEach, replace = FALSE)
s0 <- sample(1:n1, nSampleEach, replace = FALSE)
s1 # Normalizing image values into [-0.5, 0.5]
<- x0_all[,,s0]/255 - 0.5
x0 <- x1_all[,,s1]/255 - 0.5
x1 # Combine training data
<- abind(x0, x1, along = 3)
train_X # Add negligible noise for the images
# (so no constant zero values in one pixel over all covariate matrices)
set.seed(2021) # Set seed for reproducibility
<- array((rnorm(prod(dim(train_X)), 0, 0.1)), dim(train_X))
train_n <- train_X + train_n # Contaminated images
train_Xn # Define Y = 0 for label 2, and Y = 1 for label 5
<- c(rep(0, dim(x0)[3]), rep(1, dim(x1)[3])) train_y
```

In the package
*TensorTest2D*, the
function `tensorReg2D()`

is also used for fitting the logistic tensor
regression model to data:
\[\log{\frac{Pr(Y_i = 1\mid\mathbf{X}_i)}{Pr(Y_i = 0\mid\mathbf{X}_i)}} = \beta + \mathbf{X}_i \circ \mathbf{B}\]
Thus, a prediction for the digit presented in image \(\mathbf{X}_i\) is
\[\hat Y_i = \underset{k \in \{0,1\}}{\arg\max}\,{Pr(Y_i = k\mid\mathbf{X}_i)} = I\{Pr(Y_i = 1\mid\mathbf{X}_i) > 0.5\},\]
where \(I\{E\} = 1\), if \(E\) is true, and \(I\{E\} = 0\), otherwise. To
analyze the sampled data set, first, we feed the response variable,
`train_y`

, and the contaminated images, `train_Xn`

, as inputs for model
training. There is no auxiliary information available to further
stratify the \(y_i\)’s, we specify a constant vector
`W = matrix(1, length(train_y), 1)`

of length \(n\), and if a rank \(R = 4\)
model is needed, we set `n_R = 4`

. (In fact, the rank-4 model is the
best model in this logistic tensor regression.)

```
# Train the logistic tensor regression model
<- tensorReg2D(y = train_y, X = train_Xn,
lgMdl W = matrix(1, length(train_y), 1),
n_R = 4, family = "binomial",
opt = 1, max_ite = 100, tol = 10^(-7) )
# Print model summary (not run)
#summary(lgMdl)
# Print the p-values of the estimates
cat("FDR-adjusted p-values of B_pq:\n")
```

`#> FDR-adjusted p-values of B_pq:`

`round(matrix(p.adjust(as.vector(lgMdl$B_PV), method = "fdr"), 10, 10), 3)`

```
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 0.263 0.830 0.114 0.243 0.019 0.480 0.595 0.629 0.491 0.830
#> [2,] 0.558 0.552 0.098 0.491 0.263 0.948 0.137 0.816 0.491 0.953
#> [3,] 0.923 0.927 0.029 0.012 0.999 0.017 0.204 0.050 0.471 0.008
#> [4,] 0.648 0.541 0.004 0.019 0.029 0.204 0.004 0.655 0.541 0.156
#> [5,] 0.293 0.491 0.055 0.004 0.381 0.004 0.101 0.024 0.081 0.006
#> [6,] 0.110 0.491 0.954 0.491 0.648 0.948 0.954 0.865 0.491 0.825
#> [7,] 0.023 0.652 0.889 0.019 0.489 0.491 0.110 0.188 0.042 0.029
#> [8,] 0.137 0.706 0.830 0.491 0.244 0.145 0.491 0.491 0.889 0.889
#> [9,] 0.655 0.034 0.655 0.977 0.083 0.114 0.019 0.629 0.706 0.471
#> [10,] 0.137 0.055 0.491 0.764 0.491 0.602 0.019 0.471 0.454 0.025
```

For binary classification problems, we can apply the function `plot()`

of our package
*TensorTest2D* in two
ways. Similar to that Figure 1, we can make a plot
first for the values of t-statistics for the pixels by using the
`plot()`

function as shown below this paragraph. Here, we adjust the
\(p\)-values according to the approach in (Benjamini and Hochberg 1995) by
setting the parameter `method = "fdr"`

. The resulting plot is shown in
Figure 4, and most of the effective pixels can be
found on the left half side of the plot.

```
plot(x = lgMdl, method = "fdr", alpha = 0.05, type = "tval",
showlabels = TRUE, plot.legend = TRUE)
```