Methods based on Received Signal Strength Indicator (RSSI) fingerprinting are in the forefront among several techniques being proposed for indoor positioning. This paper introduces the R package *ipft*, which provides algorithms and utility functions for indoor positioning using fingerprinting techniques. These functions are designed for manipulation of RSSI fingerprint data sets, estimation of positions, comparison of the performance of different positioning models, and graphical visualization of data. Well-known machine learning algorithms are implemented in this package to perform analysis and estimations over RSSI data sets. The paper provides a description of these algorithms and functions, as well as examples of its use with real data. The *ipft* package provides a base that we hope to grow into a comprehensive library of fingerprinting-based indoor positioning methodologies.

Intelligent spaces, as a particularity of the concept known as Ambient Intelligence (AmI) (Werner et al. 2005; Aarts and Wichert 2009), where agents communicate and use technology in a non-intrusive way, have an interest in both open and closed environments. Since people spend 90% of time indoors (Klepeis et al. 2001), one of the most relevant aspects of AmI is indoor localization, due to the large number of potential applications: industrial and hospital applications, passenger transport, residences, assistance to emergency services and rescue, localization and support guide for the disabled, leisure applications, etc. It is expected that the global market for this type of location will grow from USD 7.11 billion in 2017 to USD 40.99 billion by 2022 (Research and markets 2017), being among the key technologies in the future. This is a technology that has already awakened but that in a short period of time will suffer a big explosion, as happened with the systems of positioning by satellite in exteriors and its applications.

This paper introduces the R package
*ipft* (Sansano 2017), a
collection of algorithms and utility functions to create models, make
estimations, analyze and manipulate RSSI fingerprint data sets for
indoor positioning. Given the abundance of potential applications for
indoor positioning, the package may have a broad relevance in fields
such as pervasive computing, Internet of Things (IoT) or healthcare,
among many others.

The main progress in indoor location systems has been made during the
last years. Therefore, both the research and commercial products in this
area are new, and researchers and industry are currently involved in the
investigation, development and improvement of these systems. We believe
that the R language is a good environment for machine learning and data
analysis related research, as its popularity is constantly growing ^{1},
researchers related to indoor positioning have explicitly selected R as
developing framework for their experiments
(Popleteev et al. 2011; Harbicht et al. 2017; Quan et al. 2017), it is well maintained by an
active community, and provides an ecosystem of good-quality packages
that leverage its potential to become a standard programming platform
for researchers. There are some open source applications and frameworks
to build indoor positioning services, such as FIND ^{2}, Anyplace ^{3}
or RedPIN ^{4}, based on fingerprinting techniques but, as far as we
know, there is not any public framework or package that provides
functions and algorithms to manipulate fingerprinting datasets and
experiment with positioning algorithms.

RSSI (Received Signal Strength Indicator) positioning systems are based on measuring the intensities of the received radio signals of the emitting devices (beacons) that are available at a particular position, and comparing them with a previously built RSSI data set (Lee et al. 2013). RSSI is used to measure the relative quality of a received signal to a client device, and each chipset manufacturer is free to define their own scale for this term. The value read by a device is given on a logarithmic scale and can correspond to an instant reading or a mean of some consecutive readings.

In this scenario, a fingerprint is an RSSI feature vector composed of received signal values from different emitting devices or beacons, associated to a precise position. In the last years, this technique is becoming increasingly important for indoor localization (Liu et al. 2007; He and Chan 2016), since Wi-Fi is generally available in indoor environments where GPS signals cannot penetrate, and the wireless access points (WAPs) can be used as emitting devices (Li et al. 2006). Other types of indoor localization RF emitters, such as Bluetooth (Wang et al. 2013), RFID (Liu et al. 2011), or Ultra Wide Band (UWB) (Gigl et al. 2007), can be also used in combination with Wi-Fi access points or as a standalone positioning system.

The RSSI fingerprinting localization approach requires two phases of operation: a training phase, also known as off-line or survey phase, and a positioning phase, sometimes referred as on-line, runtime or tracking phase. In the training phase, multidimensional vectors of RSSI values (the fingerprints) are generated and associated with known locations. These measurements are used to build a data set (also known as radio map) that covers the area of interest. This data set can include, along with the collected RSSI values and the location coordinates, many other useful parameters, as the device type used in the measurements or its orientation. Later, during the positioning phase, an RSSI vector collected by a device is compared with the stored data to generate an estimation of its position (Figure 1).

Despite the increasing interest in RSSI positioning (Xiao et al. 2016), this topic has not been explicitly covered yet by any publicly available R package. The proposed package has been developed to provide users with a collection of fundamental algorithms and tools to manipulate RSSI radio maps and perform fingerprinting analysis. While fundamental algorithms and similarity measurement functions are implemented to provide a main framework for research and comparison purposes, these are highly customizable, to allow researchers to tailor those methods with their own parameters and functions.

This paper describes these algorithms and their implementation, and provides examples of how to use them. The remainder of the paper is structured as follows: Section 2 defines the fingerprinting problem statement and the nomenclature that will be used in the rest of the paper. An overview of the implemented algorithms is given in Section 3. Section 4 outlines some data wrangling techniques included in the package. Section 5 describes the implemented positioning algorithms. Section 6 presents the included methods for access point position estimation. Then, Section 7 discuses some tools and functions included to create clusters or groups of fingerprints. Section 8 illustrates the use of the plotting functions also included in the package. In all these sections, functions are described and explored using practical examples, and particular emphasis is placed on how to use them with real world examples and data sets. Finally, the paper is summarized in Section 9.

This section provides a brief and general introduction to the principles
of fingerprinting positioning, as well as a description of the notation
and terminology that will be used in the next sections. The terms
described here are related to general concepts of fingerprinting
techniques, while the remaining of the paper describes the particular
implementation of these concepts in the `ipft`

package.

The main goal of the indoor localization techniques is to determine the position of a user in an indoor environment, where the GPS signal might not be received. This objective might require the use of an existing infrastructure, the deployment of a new one, the use of the so-called signals-of-opportunity (Yang et al. 2014), or even a combination of some of these techniques. Many of these techniques take advantage of the radio-frequency signals emitted by devices, whose position can be known or not, to estimate the user’s position from the perceived strength of these signals. There are many kinds of devices that can be used for this purpose, such as Wi-Fi access points, bluetooth beacons, RFID or UWB devices, but for all of them, the information provided for a given position, the fingerprint, can be stored as a vector of received signal strength intensities (RSSI), whose length is determined by the number of detected emitters.

A radio map, or a fingerprinting data set, is composed of a set of collected fingerprints and the associated positions where the measurements were taken, and may contain some additional variables, such as the the type of device used or a time stamp of the observation, among any other useful data. Let \(\mathcal{D}\) be a fingerprinting data set. Then:

\[\mathcal{D} = \left\lbrace\mathcal{F}, \mathcal{L}\right\rbrace\]

where \(\mathcal{F}\) is the set of collected fingerprints and \(\mathcal{L}\) is the set of associated locations.

For research purposes, a fingerprinting data set is usually divided into training and test sets. The training data set is used to store the fingerprints and location data to create models of the environment that can be used to estimate the position of a new fingerprint. The test data set is used to test the models obtained from the training data, and to compute the errors from the results of the position estimation.

Let \(\mathcal{D}_{train}\) be a training data set:

\[\mathcal{D}_{train} = \left\lbrace\mathcal{F}_{train}, \mathcal{L}_{train}\right\rbrace\]

where

\[\mathcal{F}_{train} = \left\lbrace \lambda_{1}^{tr}, \lambda_{2}^{tr}, ..., \lambda_{n}^{tr} \right\rbrace\]

\[\mathcal{L}_{train} = \left\lbrace \tau_{1}^{tr}, \tau_{2}^{tr}, ..., \tau_{n}^{tr} \right\rbrace\]

\(\mathcal{D}_{train}\) is composed of \(n\) fingerprints, stored as \(n\) vectors of RSSI measurements (\(\lambda_{i}^{tr}, \ i \in [1, 2, ..., n]\)), and \(n\) locations (\(\tau_{i}^{tr}, \ i \in [1, 2, ..., n]\)), stored as vectors, representing the position associated with its correspondent fingerprint. Each fingerprint consists of \(q\) RSSI values (\(\rho_{h,i}^{tr}\), \(h \in [1,...,q]\)), where \(q\) is the number of beacons considered when building the training set:

\[\lambda_{i}^{tr} = \left\lbrace \rho_{1,i}^{tr}, \rho_{2,i}^{tr}, ..., \rho_{q,i}^{tr} \right\rbrace, \ i \in [1,...,n]\]

and each associated position is composed of one or more values, depending on the number of dimensions to be considered and the coordinate system used. The position can be given as a vector of values representing its coordinates, although on multi-floor and multi-building environments labels can be used to represent buildings, floors, offices, etc. Let \(l\) be the number of dimensions of a position vector. Then:

\[\tau_{i}^{tr} = \left\lbrace \nu_{1,i}^{tr}, \nu_{2,i}^{tr}, ..., \nu_{l,i}^{tr} \right\rbrace, \ i \in [1,...,n]\]

The test data set is also composed of a collection of fingerprints associated to known positions. This data set is used for testing purposes, during research or during model building adjustments, to assess the model’s performance by comparing its estimation of the positions with the ground truth.

The situation is different in real applications, where the goal is to estimate the unknown position of the receiver given the RSSI values detected at a particular location, using a previously built model. In this case, the test data set is just composed of a unique fingerprint, and the objective is to estimate the actual location of the receiver. Therefore, no information about its location is provided.

The test data set is composed of \(m\) observations:

\[\mathcal{D}_{test} = \left\lbrace\mathcal{F}_{test}, \mathcal{L}_{test}\right\rbrace\]

where

\[\mathcal{F}_{test} = \left\lbrace \lambda_{1}^{ts}, \lambda_{2}^{ts}, ..., \lambda_{m}^{ts} \right\rbrace\]

\[\mathcal{L}_{test} = \left\lbrace \tau_{1}^{ts}, \tau_{2}^{ts}, ..., \tau_{m}^{ts} \right\rbrace\]

To be able to compare the test observations with the training fingerprints, the number of RSSI values of its respective fingerprints has to be the same, and the position in the RSSI vector must represent the same beacon in both data sets. Therefore, each one of the \(m\) observations of the test data set is composed of a fingerprint with \(q\) RSSI values:

\[\lambda_{j}^{ts} = \left\lbrace \rho_{1,j}^{ts}, \rho_{2,j}^{ts}, ..., \rho_{q,j}^{ts} \right\rbrace, \ j \in [1,...,m]\]

and a location vector with the same spatial dimensions as the training location vectors:

\[\tau_{j}^{ts} = \left\lbrace \nu_{1,j}^{ts}, \nu_{2,j}^{ts}, ..., \nu_{l,j}^{ts} \right\rbrace, \ j \in [1,...,m]\]

The notation depicted above will be used in the remaining of the paper to represent the fingerprinting data. Symbols \(i\) and \(j\) will be used to represent iterations over the training and test data sets, respectively, while \(h\) will be used to iterate over the beacons present in each fingerprint.

This section presents an introduction to the main functions, included in
the *ipft*^{5} package, that
implement fingerprinting-based indoor localization methods. The package
also provides two data sets for training and validation purposes that
are briefly described in this section.

The `ipft`

package implements three algorithms to build models to
estimate the position of a receiver in an indoor environment. Two of
these implementations are based on the well known k-Nearest Neighbors
algorithm (*knn*) (Cover and Hart 1967) to, given an RSSI vector, select the \(k\)
most similar training examples from the radio map. The similarity
between the RSSI value vectors can be measured, for example, as the
\(euclidean\) distance between them, but other distance functions may be
used (Torres-Sospedra et al. 2015b). The selection of a method to compute
this measure can be provided to the function in two ways, either
choosing one of the already implemented distance measurements
(\(euclidean\), \(manhattan\), etc.), or by way of a reference to a function
implemented by the user that returns the distance (the lower, the more
similar or ‘closer’) between two matrices or vectors. Once the \(k\)
neighbors are selected, the location of the user is estimated as the
weighted average of the neighbors positions.

The first implementation, corresponding to the function `ipfKnn`

, may
behave in a deterministic way, finding the \(k\) more similar neighbors
using a deterministic similarity function such as the euclidean or
manhattan distances, or in a probabilistic way, using similarity
functions such as LDG (Logarithmic Gaussian Distance) or PLGD (Penalized
Logarithmic Gaussian Distance) (Cramariuc et al. 2016a), that are based upon
statistical assumptions on the RSSI measurement error. The similarity
function can be chosen from the set of implemented options or provided
by the user via a custom function. This implementation is discussed in
the Section 5.1.

The other implementation of the knn algorithm assumes a probabilistic nature for the received signal distribution (Roos et al. 2002) and uses collections of many fingerprints at each particular position, acquired during the training phase. Therefore, the radio map is composed of several groups, where a group is a set of fingerprints (vectors of RSSI values) that share the same location. Assuming that the RSSI value for a specific beacon can be modeled as a random variable following a normal distribution (Haeberlen et al. 2004), any of these collections, or groups, of fingerprints can be represented by the statistical parameters of this distribution, in this case, the mean and the standard deviation. This implies that the original data set can be transformed into a new type of data structure by storing the mean and the standard deviation of every detected beacon for every group. All the original data for a group is transformed into two vectors, one storing the means and the other the standard deviations. The trustworthiness of the data in the new data set will depend on the number of measurements for every location of the original data. It is assumed that the more measurements for a particular location, the more reliable will be their inferred statistical parameters.

The implementation of this probabilistic-based method takes the original
radio map and a set of group indices, and fits these groups of
measurements to a normal (Gaussian) distribution for every beacon and
every location, so that the signal intensity distribution is determined
by the mean and the standard deviation of the Gaussian fit. Then, given
a test fingerprint, the algorithm estimates its position by selecting
the *k* most probable locations, making explicit use of the statistical
parameters of the data stored in the radio map to optimize the
probabilities in the assignment of the estimated position by computing a
similarity function based on a summatory of probabilities. This approach
is implemented through the `ipfProbabilistic`

function and is described
in the Section 5.2.

Finally, the third implemented algorithm is based on a scenario where
the location of the beacons is known, and an estimation of the
fingerprint position can be made using the log-distance path loss model
(Seybold, J.S. 2005). The strength of the received signal at a
particular point can be modeled as a function of the logarithmic
distance between the receiver and the emitter and some parameters
related to the environment properties and the devices characteristics.
Therefore, as this method uses an analytical model to evaluate the
position, no radio map is needed to train a model to compare
fingerprints with, since the position might be estimated from the
fingerprint data and the position of the beacons. This method is
implemented by the function `ipfProximity`

and is described in Section
5.3.

The previous functions `ipfKnn`

, `ipfProbabilistic`

and `ipfProximity`

create models based on the training data and parameters provided. These
models can then be evaluated using the `ipfEstimate`

function, that
internally detects the algorithm to apply based on the model that
receives as parameter.

The package also includes data from the IPIN2016^{6} Tutorial data set.
In the `ipftrain`

data frame there are \(n\) = 927 observations, including
the RSSI values for \(q\) = 168 wireless access points, the location,
expressed in Cartesian coordinates, for the observation (x, y), and some
other variables, as timestamps for the measurements or an identifier for
the user who took the survey. The `ipftest`

data frame contains \(m\) =
702 observations with the same structure, for testing and validation
purposes. The fingerprints included in both data sets where taken in the
same building and the same floor. The `ipfpwap`

data frame contains the
position of 39 of the WAPs included in the `ipftrain`

and `ipftest`

data
sets. The unknown positions of the remaining WAPs are stored as `NA`

.
The characteristics of these data sets attributes are:

`RSSI values`

: Columns from 1 to 168. The values represent the strength of the received signal expressed in decibels, on a scale that ranges from \(-30\)dBm to \(-97\)dBm in the training set, and from \(-31\)dBm to \(-99\)dBm in the test set. The closer the value to zero, the stronger the signal.`position`

: Columns 169 (X) and 170 (Y). The position given in Cartesian coordinates, with its origin in the same corridor where the data was acquired.`user id`

: A numeric value from 1 to 8 to represent each of the 8 users that acquired the train data set. The test dataset was acquired by a different user, represented by the value 0.`timestamp`

: The UNIX time stamp of the observation, in seconds.

There are some other publicly available indoor location data sets that have been used to develop and test this package and that are not included for size reasons, as the UJIIndoorLoc Data Set (Torres-Sospedra et al. 2015a) or the Tampere University data set (Cramariuc et al. 2016b).

The theoretical foundations of the algorithms and its uses are discussed
in detail in Section 5. A description of
the functions `ipfKnn`

, `ipfProximity`

, `ipfProbabilistic`

and
`ipfEstimate`

is given while presenting some simulations to show how
these algorithms can be useful in practice.

An RSSI fingerprint is a vector composed of signal strength measurements from all the emitters received by a client device at a particular point, and can be measured in any unit of power. It is often expressed in decibels (dBm), or as percentage values between 1-100, and can be a negative or a positive value. Typically this values are stored as negative figures, where the strongest signals are closer to zero.

Some algorithms are sensitive to the scale of the data. For example, Neural Networks generally work better (LeCun et al. 1998) with data scaled to a range between [0, 1] or [\(-1\), 1], since unscaled data may slow down the learning process and the convergence of the network parameters and, in some cases, prevent the network from effectively learning the problem. Thus, the first step before the data can be fed to a positioning algorithm may involve some kind of transformation, depending on the characteristics of the original data.

The data sets included in this package represent the RSSI data from a
set of wireless access points as negative integer numbers from \(-99\)
(weakest detected signal) to \(-30\) (strongest detected signal). When the
RSSI of a WAP is not available, the value used is `NA`

. This convention
may be inconvenient for some calculations. For example, a similarity
measure between two fingerprints as the euclidean distance will only
take into account those WAPs that have been detected in both
observations, causing a loss of information that otherwise could be
utilized.

The *ipft* package contains
some functions to manipulate and wrangle raw fingerprint data. The
`ipfTransform`

function mutates the given fingerprint data into a new
data set with a specified range for the RSSI signals. The signature of
the function is:

```
<- function(data, outRange = c(0, 1), outNoRSSI = 0, inRange = NULL,
ipfTransform inNoRSSI = 0, trans = "scale", alpha = 24)
```

where:

`data`

: The input data set with the original RSSI fingerprints.`outRange`

: A numeric vector with two values indicating the desired range of the output data.`outNoRSSI`

: The desired value for not detected beacons in the output data.`inRange`

: A numeric vector with two values indicating the range of signal strength values in the input data. If this parameter is not provided, the function will infer it from the provided data.`inNoRSSI`

: The value given to a not detected beacon in the original data.`trans`

: The transformation to perform over the RSSI data, either ‘scale’ or ‘exponential’.`alpha`

: The \(\alpha\) parameter for the exponential transformation.

The *scale* transformation scales the input data values to a range
specified by the user. The feature scaling is performed according to
Equation (1):

\[\label{eq:scaleT} \rho_{h,i}^{out} =\begin{cases} a+b\cdot\rho_{h,i}^{in}, &\mathrm{if}\,\rho_{h,i}^{in}\neq inNoRSSI\\ outNoRSSI,&otherwise \end{cases} \tag{1}\]

\[b = \frac{outMin-outMax}{inMin-inMax}\]

\[a = outMin - inMin \cdot b\]

where:

- \(\rho_{h,i}^{out}\) and \(\rho_{h,i}^{in}\) are the output and input RSSI values, respectively, for the \(h^{th}\) beacon from the \(i^{th}\) observation
- \(outMax\) and \(outMin\) are the maximum and minimum values,
respectively, specified for the output by the
`outRange`

parameter. - \(inMax\) and \(inMin\) are the maximum and minimum values, respectively, of the input data.
- \(outNoRSSI\) and \(inNoRSSI\) are the values assigned in the
fingerprint to represent a not detected beacon for the output and
input data, respectively, specified by the parameters
`outNoRSSI`

and`inNoRSSI`

.

The *exponential* transformation (Torres-Sospedra et al. 2015b) changes the
data according to the next equation:

\[\rho_{h,i}^{out} =\begin{cases} \exp(\frac{pos(\rho_{h,i}^{in})}{\alpha}) , &\mathrm{if}\,\rho_{h,i}^{in}\neq inNoRSSI\\ outNoRSSI,&otherwise \end{cases}\]

\[pos(\rho_{h,i}^{in}) =\begin{cases} \rho_{h,i}^{in} - inMin , &\mathrm{if}\,\rho_{h,i}^{in}\neq inNoRSSI\\ 0,&otherwise \end{cases}\]

where \(\alpha\) is a parameter for the exponential transformation. The authors establish \(\alpha\) as a case-based parameter, and find that 24 is a good value for RSSI fingerprinting data, but they did not study the effects of \(\alpha\) in the transformed data.

The following code scales the `ipftrain`

and `ipftest`

data sets RSSI
data, stored in the columns 1:168, to a positive range of values, from 0
to 1, with NA representing a not detected WAP. As a not detected WAP is
represented by a `NA`

value in the original data, this has to be
indicated to the function so it can transform these values to the
desired output:

```
<- ipfTransform(ipftrain[, 1:168], outRange = c(0.1, 1), inNoRSSI = NA,
trainRSSI outNoRSSI = NA)
<- ipfTransform(ipftest[, 1:168], outRange = c(0.1, 1), inNoRSSI = NA,
testRSSI outNoRSSI = NA)
```

The `ipfTransform`

function returns a new data set with the same
structure (vector, matrix or data frame) as the input.

This section describes three positioning algorithms implemented in the
`ipft`

package. The examples illustrating each description are based on
the data previously scaled in Section 4.

`ipfKnn`

function.The `ipfKnn`

and `ipfEstimate`

functions implement a version of the knn
algorithm to select the \(k\) nearest neighbors (the \(k\) more similar
vectors from the training set) to a given RSSI vector. Many different
distance metrics (Torres-Sospedra et al. 2015b) can be used to compare two
RSSI vectors and measure how ‘near’ or similar they are.

The distance metrics implemented in the package include some typical
functions, as the *\(L^{1}\)* norm, or manhattan distance, or the
*\(L^{2}\)*, or euclidean distance. The *\(L^{u}\)* norm between two
fingerprints with indices \(a\) and \(b\) is defined as follows:

\[L^{u} = \left(\sum_{h=1}^{q} \lvert(\rho_{h,a} - \rho_{h,b}\rvert^{u} \right)^{1/u}\]

The package also implements some fingerprinting specific distance estimation functions such as LDG and PLGD. The LGD between two RSSI vectors \(\lambda_{i}^{tr}\) and \(\lambda_{j}^{ts}\) of longitude \(q\) is given by:

\[LGD(\lambda_{i}^{tr}, \lambda_{j}^{ts}) = - \sum_{h = 1} ^ {q} \log \ \max(G(\rho_{h, i}^{tr},\ \rho_{h, j}^{ts}),\ \epsilon)\]

where \(\epsilon\) is a parameter to avoid logarithm of zero, as well as having one beacon RSSI value influence the LGD only above a certain threshold. \(G(\rho_{h, i}^{tr},\ \rho_{h, j}^{ts})\) represents the Gaussian similarity between \(\rho_{h, i}^{tr}\) and \(\rho_{h, j}^{ts}\), defined as

\[\mathrm{G}(\rho_{h, i}^{tr}, \rho_{h, j}^{ts})= \begin{cases} \frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{(\rho_{h, i}^{tr} - \rho_{h, j}^{ts})^{2}}{2\sigma^{2}}\right), &\mathrm{if}\,\rho_{h, i}^{tr}\neq 0\,\mathrm{and}\,\rho_{h, j}^{ts}\neq 0\\ 0,&otherwise \end{cases}\]

The \(\sigma^2\) parameter represents the shadowing variance (Shrestha et al. 2013). Values for \(\sigma\) in the range between 4 and 10 dBm are usually good for indoor scenarios (Lohan et al. 2014).

The PLGD between two RSSI vectors \(\lambda_{i}^{tr}\) and \(\lambda_{j}^{ts}\) of longitude \(q\) is given as:

\[PLGD(\lambda_{i}^{tr}, \lambda_{j}^{ts}) = LGD(\lambda_{i}^{tr}, \lambda_{j}^{ts}) + \alpha(\phi(\lambda_{i}^{tr}, \lambda_{j}^{ts}) + \phi(\lambda_{j}^{ts}, \lambda_{i}^{tr}))\]

where \(\phi(\lambda_{i}^{tr}, \lambda_{j}^{ts})\) is a penalty function for the beacons that are visible in the \(i^{th}\) training fingerprint but not in the \(j^{th}\) test fingerprint, \(\phi(\lambda_{j}^{ts}, \lambda_{i}^{tr})\) is a penalty function for the beacons that are visible in the \(j^{th}\) test fingerprint but not in the \(i^{th}\) training fingerprint, and are defined as follows:

\[\phi(\lambda_{i}^{tr}, \lambda_{j}^{ts}) = \sum_{h=1} ^ {q} T_{max} - \rho_{h, i}^{tr}, \ \mathrm{for} \ \ 0 < \rho_{h, i}^{tr} \leq T_{max} \ \mathrm{and} \ r_i = 0)\]

\[\phi(\lambda_{j}^{ts}, \lambda_{i}^{tr}) = \sum_{h=1} ^ {q} T_{max} - \rho_{h, j}^{ts}, \ \mathrm{for} \ \ 0 < \rho_{h, j}^{ts} \leq T_{max} \ \mathrm{and} \ r_j = 0)\]

\(T_{max}\) is an upper threshold for the strength of the signal, and \(\alpha\) is a scaling factor.

The similarity measurement method can be chosen by means of the
parameter `method`

, or by providing a custom function (parameters `FUN`

and `...`

). The signature of the `ipfKnn`

function is:

```
<- function(train_fgp, train_pos, k = 3, method = 'euclidean',
ipfKnn weights = 'distance', norm = 2, sd = 5, epsilon = 1e-3,
alpha = 1, threshold = 20, FUN = NULL, ...)
```

where:

`train_fgp`

: A data frame of \(n\) rows and \(q\) columns containing the fingerprint vectors of the training set.`train_pos`

: A data frame of \(n\) rows and \(l\) columns containing the positions of the training observations.`k`

: The \(k\) parameter of the \(knn\) algorithm, the number of nearest neighbors to consider.`method`

: The distance metric to be used by the algorithm. The implemented options are ‘euclidean’, ‘manhatan’, ‘norm’, ‘LGD’ and ‘PLGD’`weights`

: The weight function to be used by the algorithm. The implemented options are ‘distance’ and ‘uniform’. The default ‘distance’ function calculate the weights from the distances as: \[w_{j,t} = \frac{1}{(1 + d_{j,t})\mathcal{W}_{j}}\] where \(w_{j,t}\) is the weight assigned to the \(t^{th} \ (t \in{[1 .. k])}\) neighbor of the \(j^{th} \ (j \in{[1 .. m])}\) test observation, \(d_{j,t}\) is the distance in the feature (RSSI) space between the \(t^{th}\) neighbor and the \(j^{th}\) test fingerprint, and \(\mathcal{W}_{j}\) is a term used to normalize the values so that the total sum of the \(k\) weights is 1. The ‘uniform’ function assigns the same weight value to each neighbor: \[w_{j,t} = \frac{1}{k}\]`norm, sd, epsilon, alpha, threshold`

: Parameters for the ‘norm’, ‘LGD’ and ‘PLGD’ methods.`FUN`

: An alternative function provided by the user to compute the distance.`...`

: Additional parameters for the function FUN.

For a training data set of \(n\) RSSI vectors (a data frame or a matrix
named `tr_fingerprints`

) and a data set of \(n\) position vectors (a data
frame or a matrix named `tr_positions`

), the code for fitting a knn
model with a \(k\) value of 4 and the manhattan distance as the distance
measurement method is:

`<- ipfKnn(tr_fingerprints, tr_positions, k = 4, method = 'manhattan') knnModel `

This function returns an S3 object of class `ipftModel`

containing the
following properties:

`params`

: A list with the parameters passed to the function.`data`

: A list with the fingerprints and the location data of the radio map.

To estimate the position of a new fingerprint, the `ipfEstimate`

function makes use of the previously obtained model. An `ipfModel`

object holds the data model needed by the `ipfEstimate`

function to
apply the selected algorithm and returns an estimation of the test
fingerprints positions. The signature of `ipfEstimate`

is:

`<- function(ipfmodel, test_fgp, test_pos = NULL) ipfEstimate `

where:

`ipfmodel`

: An S3 object of class`ipfModel`

.`test_fgp`

: A data frame of \(m\) rows and \(q\) columns containing the fingerprints of the test set.`test_pos`

: An optional parameter containing a data frame of \(m\) rows and \(l\) columns with the position of the test observations.

The `ipfEstimate`

function returns an S3 object of the class
`ipfEstimation`

with the following elements:

`location`

: A \(m \times l\) matrix with the predicted position for each observation in the`test`

data set.`errors`

: If the actual location of the test observations is passed in parameter`test_pos`

, and the data that represents the position is numeric, this property returns a numeric vector of length \(n\) with the errors, calculated as the \(euclidean\) distances between the actual and the predicted locations.`confusion`

: If the actual location of the test observations is passed in parameter`test_pos`

, and the data that represents the position is a factor, the estimation of the actual position is performed as a classification task, and this property returns a confusion matrix summarizing the results of this classification.`neighbors`

: A \(m \times k\) matrix with the indices of the \(k\) selected neighbors for each observation in the`test`

data set.`weights`

: A \(m \times k\) matrix containing the weights assigned by the algorithm to the selected neighbors.

The following `R`

code shows an example of the usage of the `ipfKnn`

function with the data set included in the package. This example takes
the data previously scaled and generates a positioning model from the
input data `trainRSSI`

(the radio map) that is stored in `knnModel`

.
Then, the model is passed to the `ipfEstimate`

function, along with the
test data, to get an estimation of the position of the 702 test
observations:

```
<- trainRSSI[, 1:168]
tr_fingerprints <- ipftrain[, 169:170]
tr_positions <- ipfKnn(tr_fingerprints, tr_positions, k = 7, method = "euclidean")
knnModel <- testRSSI[, 1:168]
ts_fingerprints <- ipftest[, 169:170]
ts_positions <- ipfEstimate(knnModel, ts_fingerprints, ts_positions) knnEstimation
```

Since the position of the test observations is known, the mean error for the 702 test observations can be calculated as follows:

```
> mean(knnEstimation$errors)
1] 3.302739 [
```

The mean positioning error is one of the most common evaluation metrics
used in indoor positioning (Liu et al. 2007) to assess the system’s accuracy.
This metric corresponds to the average Euclidean distance between the
estimated locations and the true locations. As positions in the
`ipftrain`

and `ipftest`

are expressed in meters, this metric represents
the average error in meters for this scenario.

The neighbors selected from the training data set for the 6 first test fingerprints are:

```
> head(knnEstimation$neighbors)
1] [,2] [,3] [,4] [,5] [,6] [,7]
[,1,] 71 176 126 125 127 771 130
[2,] 71 176 126 125 127 771 130
[3,] 465 914 915 913 217 77 218
[4,] 465 914 915 176 913 461 217
[5,] 176 126 125 771 130 127 914
[6,] 77 914 915 217 176 465 218 [
```

where each row of the output corresponds to the indices of the \(k\) = 7
more similar vectors from the training data set to the \(i^{th}\) vector
of the `test`

data set.

As an example of how to use `ipfKnn`

with a custom function, the next
code shows the definition of a *C++* function that implements a modified
version of the manhattan distance. The function needs at least two
parameters, the two matrices representing the training and test data
sets. A third parameter is here introduced to represent a penalization
value. This function penalizes the computed distance between two RSSI
measurements when one of the beacons is not detected (represented by the
value \(\emptyset\)), by multiplying the resulting distance by a factor
\(F\). Given two fingerprints \(\lambda_{i}^{tr}\) and \(\lambda_{j}^{ts}\) of
length \(q\), the \(myD\) distance is:

\[\mathrm{myD}(\lambda_{i}^{tr}, \lambda_{j}^{ts})= \sum_{h=1}^{q}{\mathrm{myd}(\rho_{h,i}^{tr}, \rho_{h,j}^{ts})},\]

where

\[\mathrm{myd}(\rho_{h,i}^{tr}, \rho_{h,j}^{ts})= \begin{cases} \mathrm{\lvert}\rho_{h,i}^{tr} - \rho_{h,j}^{ts}\rvert , &\mathrm{if}\,\rho_{h,i}^{tr}\neq \emptyset\,\mathrm{and}\,\rho_{h,j}^{ts}\neq \emptyset\\ \mathrm{\lvert}\rho_{h,i}^{tr} - \rho_{h,j}^{ts}\lvert F , &otherwise \end{cases}\]

The following code implements the `myD`

function and shows an example of
its usage with `ipfKnn`

, as well as the results obtained. The function
is coded in *C++* to improve its performance when using large data sets,
although the method also accepts custom plain R functions. The `myD`

function assumes that the fingerprints are in a positive range:

```
library('ipft')
library('Rcpp')
cppFunction('
NumericMatrix myD(NumericMatrix train, NumericMatrix test, double F = 2.0) {
NumericMatrix distanceMatrix(test.nrow(), train.nrow());
double d = 0, pv = 0, rssi1 = 0, rssi2 = 0;
for (int itrain = 0; itrain < train.nrow(); itrain++) {
for (int itest = 0; itest < test.nrow(); itest++) {
d = 0;
for (int i = 0; i < train.ncol(); i++) {
rssi1 = R_IsNA(train(itrain, i))? 0 : train(itrain, i);
rssi2 = R_IsNA(test(itest, i))? 0 : test(itest, i);
pv = (rssi1 != 0 && rssi2 != 0)? 1 : F;
d = d + std::abs(rssi1 - rssi2) * pv;
}
distanceMatrix(itest, itrain) = d;
}
}
return distanceMatrix;
}'
)<- ipfKnn(tr_fingerprints, tr_positions, k = 1, FUN = myD, F = 0.25)
customModel <- ipfEstimate(customModel, ts_fingerprints, ts_positions)
customEstimation
> head(customEstimation$neighbors)
1]
[,1,] 773
[2,] 773
[3,] 776
[4,] 773
[5,] 130
[6,] 130 [
```

The previous code outputs the selected neighbors for the first 6
observations in the test data set. As the `ts_positions`

data frame
contains the actual location of the observations, the absolute error
committed by the model is returned in the `ipfEstimation`

object:

```
> head(customEstimation$errors)
1] 5.708275 5.708275 5.708275 5.708275 3.380000 3.380000
[
function is:
And the mean error with this custom similarity
> mean(customEstimation$errors)
1] 3.297342 [
```

An `ipfEstimation`

object can be used directly to plot the Empirical
cumulative distribution function of the error (function `ipfPlotEcdf()`

)
and the Probability density function (function `ipfPlotPdf()`

). Figures
1 and 2 show the plots obtained from the following code:

```
> ipfPlotEcdf(customEstimation)
> ipfPlotPdf(customEstimation)
```

The plotting functions included in the package are described in detail in Section 8.