Indoor Positioning and Fingerprinting: The R Package ipft

Methods based on Received Signal Strength Indicator (RSSI) fingerprinting are in the forefront among several techniques being proposed for indoor positioning. This paper introduces the R package ipft, which provides algorithms and utility functions for indoor positioning using fingerprinting techniques. These functions are designed for manipulation of RSSI fingerprint data sets, estimation of positions, comparison of the performance of different positioning models, and graphical visualization of data. Well-known machine learning algorithms are implemented in this package to perform analysis and estimations over RSSI data sets. The paper provides a description of these algorithms and functions, as well as examples of its use with real data. The ipft package provides a base that we hope to grow into a comprehensive library of fingerprinting-based indoor positioning methodologies.

Emilio Sansano (Institute of New Imaging Technologies, Universitat Jaume I) , Raúl Montoliu (Institute of New Imaging Technologies, Universitat Jaume I) , Óscar Belmonte (Institute of New Imaging Technologies, Universitat Jaume I) , Joaquín Torres-Sospedra (Institute of New Imaging Technologies, Universitat Jaume I)
2019-08-15

1 Introduction

Intelligent spaces, as a particularity of the concept known as Ambient Intelligence (AmI) (Werner et al. 2005; Aarts and Wichert 2009), where agents communicate and use technology in a non-intrusive way, have an interest in both open and closed environments. Since people spend 90% of time indoors (Klepeis et al. 2001), one of the most relevant aspects of AmI is indoor localization, due to the large number of potential applications: industrial and hospital applications, passenger transport, residences, assistance to emergency services and rescue, localization and support guide for the disabled, leisure applications, etc. It is expected that the global market for this type of location will grow from USD 7.11 billion in 2017 to USD 40.99 billion by 2022 (Research and markets 2017), being among the key technologies in the future. This is a technology that has already awakened but that in a short period of time will suffer a big explosion, as happened with the systems of positioning by satellite in exteriors and its applications.

This paper introduces the R package ipft (Sansano 2017), a collection of algorithms and utility functions to create models, make estimations, analyze and manipulate RSSI fingerprint data sets for indoor positioning. Given the abundance of potential applications for indoor positioning, the package may have a broad relevance in fields such as pervasive computing, Internet of Things (IoT) or healthcare, among many others.

The main progress in indoor location systems has been made during the last years. Therefore, both the research and commercial products in this area are new, and researchers and industry are currently involved in the investigation, development and improvement of these systems. We believe that the R language is a good environment for machine learning and data analysis related research, as its popularity is constantly growing 1, researchers related to indoor positioning have explicitly selected R as developing framework for their experiments (Popleteev et al. 2011; Harbicht et al. 2017; Quan et al. 2017), it is well maintained by an active community, and provides an ecosystem of good-quality packages that leverage its potential to become a standard programming platform for researchers. There are some open source applications and frameworks to build indoor positioning services, such as FIND 2, Anyplace 3 or RedPIN 4, based on fingerprinting techniques but, as far as we know, there is not any public framework or package that provides functions and algorithms to manipulate fingerprinting datasets and experiment with positioning algorithms.

RSSI (Received Signal Strength Indicator) positioning systems are based on measuring the intensities of the received radio signals of the emitting devices (beacons) that are available at a particular position, and comparing them with a previously built RSSI data set (Lee et al. 2013). RSSI is used to measure the relative quality of a received signal to a client device, and each chipset manufacturer is free to define their own scale for this term. The value read by a device is given on a logarithmic scale and can correspond to an instant reading or a mean of some consecutive readings.

In this scenario, a fingerprint is an RSSI feature vector composed of received signal values from different emitting devices or beacons, associated to a precise position. In the last years, this technique is becoming increasingly important for indoor localization (Liu et al. 2007; He and Chan 2016), since Wi-Fi is generally available in indoor environments where GPS signals cannot penetrate, and the wireless access points (WAPs) can be used as emitting devices (Li et al. 2006). Other types of indoor localization RF emitters, such as Bluetooth (Wang et al. 2013), RFID (Liu et al. 2011), or Ultra Wide Band (UWB) (Gigl et al. 2007), can be also used in combination with Wi-Fi access points or as a standalone positioning system.

The RSSI fingerprinting localization approach requires two phases of operation: a training phase, also known as off-line or survey phase, and a positioning phase, sometimes referred as on-line, runtime or tracking phase. In the training phase, multidimensional vectors of RSSI values (the fingerprints) are generated and associated with known locations. These measurements are used to build a data set (also known as radio map) that covers the area of interest. This data set can include, along with the collected RSSI values and the location coordinates, many other useful parameters, as the device type used in the measurements or its orientation. Later, during the positioning phase, an RSSI vector collected by a device is compared with the stored data to generate an estimation of its position (Figure 1).

graphic without alt text

Figure 1: During the on-line phase, once the radio map has been built, the fingerprinting algorithm uses it to estimate the device’s position by comparing the RSSI values heard by the device with the ones stored in the radio map.

Despite the increasing interest in RSSI positioning (Xiao et al. 2016), this topic has not been explicitly covered yet by any publicly available R package. The proposed package has been developed to provide users with a collection of fundamental algorithms and tools to manipulate RSSI radio maps and perform fingerprinting analysis. While fundamental algorithms and similarity measurement functions are implemented to provide a main framework for research and comparison purposes, these are highly customizable, to allow researchers to tailor those methods with their own parameters and functions.

This paper describes these algorithms and their implementation, and provides examples of how to use them. The remainder of the paper is structured as follows: Section 2 defines the fingerprinting problem statement and the nomenclature that will be used in the rest of the paper. An overview of the implemented algorithms is given in Section 3. Section 4 outlines some data wrangling techniques included in the package. Section 5 describes the implemented positioning algorithms. Section 6 presents the included methods for access point position estimation. Then, Section 7 discuses some tools and functions included to create clusters or groups of fingerprints. Section 8 illustrates the use of the plotting functions also included in the package. In all these sections, functions are described and explored using practical examples, and particular emphasis is placed on how to use them with real world examples and data sets. Finally, the paper is summarized in Section 9.

2 Problem statement. Terminology and notation

This section provides a brief and general introduction to the principles of fingerprinting positioning, as well as a description of the notation and terminology that will be used in the next sections. The terms described here are related to general concepts of fingerprinting techniques, while the remaining of the paper describes the particular implementation of these concepts in the ipft package.

The main goal of the indoor localization techniques is to determine the position of a user in an indoor environment, where the GPS signal might not be received. This objective might require the use of an existing infrastructure, the deployment of a new one, the use of the so-called signals-of-opportunity (Yang et al. 2014), or even a combination of some of these techniques. Many of these techniques take advantage of the radio-frequency signals emitted by devices, whose position can be known or not, to estimate the user’s position from the perceived strength of these signals. There are many kinds of devices that can be used for this purpose, such as Wi-Fi access points, bluetooth beacons, RFID or UWB devices, but for all of them, the information provided for a given position, the fingerprint, can be stored as a vector of received signal strength intensities (RSSI), whose length is determined by the number of detected emitters.

A radio map, or a fingerprinting data set, is composed of a set of collected fingerprints and the associated positions where the measurements were taken, and may contain some additional variables, such as the the type of device used or a time stamp of the observation, among any other useful data. Let \(\mathcal{D}\) be a fingerprinting data set. Then:

\[\mathcal{D} = \left\lbrace\mathcal{F}, \mathcal{L}\right\rbrace\]

where \(\mathcal{F}\) is the set of collected fingerprints and \(\mathcal{L}\) is the set of associated locations.

For research purposes, a fingerprinting data set is usually divided into training and test sets. The training data set is used to store the fingerprints and location data to create models of the environment that can be used to estimate the position of a new fingerprint. The test data set is used to test the models obtained from the training data, and to compute the errors from the results of the position estimation.

Let \(\mathcal{D}_{train}\) be a training data set:

\[\mathcal{D}_{train} = \left\lbrace\mathcal{F}_{train}, \mathcal{L}_{train}\right\rbrace\]

where

\[\mathcal{F}_{train} = \left\lbrace \lambda_{1}^{tr}, \lambda_{2}^{tr}, ..., \lambda_{n}^{tr} \right\rbrace\]

\[\mathcal{L}_{train} = \left\lbrace \tau_{1}^{tr}, \tau_{2}^{tr}, ..., \tau_{n}^{tr} \right\rbrace\]

\(\mathcal{D}_{train}\) is composed of \(n\) fingerprints, stored as \(n\) vectors of RSSI measurements (\(\lambda_{i}^{tr}, \ i \in [1, 2, ..., n]\)), and \(n\) locations (\(\tau_{i}^{tr}, \ i \in [1, 2, ..., n]\)), stored as vectors, representing the position associated with its correspondent fingerprint. Each fingerprint consists of \(q\) RSSI values (\(\rho_{h,i}^{tr}\), \(h \in [1,...,q]\)), where \(q\) is the number of beacons considered when building the training set:

\[\lambda_{i}^{tr} = \left\lbrace \rho_{1,i}^{tr}, \rho_{2,i}^{tr}, ..., \rho_{q,i}^{tr} \right\rbrace, \ i \in [1,...,n]\]

and each associated position is composed of one or more values, depending on the number of dimensions to be considered and the coordinate system used. The position can be given as a vector of values representing its coordinates, although on multi-floor and multi-building environments labels can be used to represent buildings, floors, offices, etc. Let \(l\) be the number of dimensions of a position vector. Then:

\[\tau_{i}^{tr} = \left\lbrace \nu_{1,i}^{tr}, \nu_{2,i}^{tr}, ..., \nu_{l,i}^{tr} \right\rbrace, \ i \in [1,...,n]\]

The test data set is also composed of a collection of fingerprints associated to known positions. This data set is used for testing purposes, during research or during model building adjustments, to assess the model’s performance by comparing its estimation of the positions with the ground truth.

The situation is different in real applications, where the goal is to estimate the unknown position of the receiver given the RSSI values detected at a particular location, using a previously built model. In this case, the test data set is just composed of a unique fingerprint, and the objective is to estimate the actual location of the receiver. Therefore, no information about its location is provided.

The test data set is composed of \(m\) observations:

\[\mathcal{D}_{test} = \left\lbrace\mathcal{F}_{test}, \mathcal{L}_{test}\right\rbrace\]

where

\[\mathcal{F}_{test} = \left\lbrace \lambda_{1}^{ts}, \lambda_{2}^{ts}, ..., \lambda_{m}^{ts} \right\rbrace\]

\[\mathcal{L}_{test} = \left\lbrace \tau_{1}^{ts}, \tau_{2}^{ts}, ..., \tau_{m}^{ts} \right\rbrace\]

To be able to compare the test observations with the training fingerprints, the number of RSSI values of its respective fingerprints has to be the same, and the position in the RSSI vector must represent the same beacon in both data sets. Therefore, each one of the \(m\) observations of the test data set is composed of a fingerprint with \(q\) RSSI values:

\[\lambda_{j}^{ts} = \left\lbrace \rho_{1,j}^{ts}, \rho_{2,j}^{ts}, ..., \rho_{q,j}^{ts} \right\rbrace, \ j \in [1,...,m]\]

and a location vector with the same spatial dimensions as the training location vectors:

\[\tau_{j}^{ts} = \left\lbrace \nu_{1,j}^{ts}, \nu_{2,j}^{ts}, ..., \nu_{l,j}^{ts} \right\rbrace, \ j \in [1,...,m]\]

The notation depicted above will be used in the remaining of the paper to represent the fingerprinting data. Symbols \(i\) and \(j\) will be used to represent iterations over the training and test data sets, respectively, while \(h\) will be used to iterate over the beacons present in each fingerprint.

3 An overview of the implemented algorithms

This section presents an introduction to the main functions, included in the ipft5 package, that implement fingerprinting-based indoor localization methods. The package also provides two data sets for training and validation purposes that are briefly described in this section.

The ipft package implements three algorithms to build models to estimate the position of a receiver in an indoor environment. Two of these implementations are based on the well known k-Nearest Neighbors algorithm (knn) (Cover and Hart 1967) to, given an RSSI vector, select the \(k\) most similar training examples from the radio map. The similarity between the RSSI value vectors can be measured, for example, as the \(euclidean\) distance between them, but other distance functions may be used (Torres-Sospedra et al. 2015b). The selection of a method to compute this measure can be provided to the function in two ways, either choosing one of the already implemented distance measurements (\(euclidean\), \(manhattan\), etc.), or by way of a reference to a function implemented by the user that returns the distance (the lower, the more similar or ‘closer’) between two matrices or vectors. Once the \(k\) neighbors are selected, the location of the user is estimated as the weighted average of the neighbors positions.

The first implementation, corresponding to the function ipfKnn, may behave in a deterministic way, finding the \(k\) more similar neighbors using a deterministic similarity function such as the euclidean or manhattan distances, or in a probabilistic way, using similarity functions such as LDG (Logarithmic Gaussian Distance) or PLGD (Penalized Logarithmic Gaussian Distance) (Cramariuc et al. 2016a), that are based upon statistical assumptions on the RSSI measurement error. The similarity function can be chosen from the set of implemented options or provided by the user via a custom function. This implementation is discussed in the Section 5.1.

The other implementation of the knn algorithm assumes a probabilistic nature for the received signal distribution (Roos et al. 2002) and uses collections of many fingerprints at each particular position, acquired during the training phase. Therefore, the radio map is composed of several groups, where a group is a set of fingerprints (vectors of RSSI values) that share the same location. Assuming that the RSSI value for a specific beacon can be modeled as a random variable following a normal distribution (Haeberlen et al. 2004), any of these collections, or groups, of fingerprints can be represented by the statistical parameters of this distribution, in this case, the mean and the standard deviation. This implies that the original data set can be transformed into a new type of data structure by storing the mean and the standard deviation of every detected beacon for every group. All the original data for a group is transformed into two vectors, one storing the means and the other the standard deviations. The trustworthiness of the data in the new data set will depend on the number of measurements for every location of the original data. It is assumed that the more measurements for a particular location, the more reliable will be their inferred statistical parameters.

The implementation of this probabilistic-based method takes the original radio map and a set of group indices, and fits these groups of measurements to a normal (Gaussian) distribution for every beacon and every location, so that the signal intensity distribution is determined by the mean and the standard deviation of the Gaussian fit. Then, given a test fingerprint, the algorithm estimates its position by selecting the k most probable locations, making explicit use of the statistical parameters of the data stored in the radio map to optimize the probabilities in the assignment of the estimated position by computing a similarity function based on a summatory of probabilities. This approach is implemented through the ipfProbabilistic function and is described in the Section 5.2.

Finally, the third implemented algorithm is based on a scenario where the location of the beacons is known, and an estimation of the fingerprint position can be made using the log-distance path loss model (Seybold, J.S. 2005). The strength of the received signal at a particular point can be modeled as a function of the logarithmic distance between the receiver and the emitter and some parameters related to the environment properties and the devices characteristics. Therefore, as this method uses an analytical model to evaluate the position, no radio map is needed to train a model to compare fingerprints with, since the position might be estimated from the fingerprint data and the position of the beacons. This method is implemented by the function ipfProximity and is described in Section 5.3.

The previous functions ipfKnn, ipfProbabilistic and ipfProximity create models based on the training data and parameters provided. These models can then be evaluated using the ipfEstimate function, that internally detects the algorithm to apply based on the model that receives as parameter.

The package also includes data from the IPIN20166 Tutorial data set. In the ipftrain data frame there are \(n\) = 927 observations, including the RSSI values for \(q\) = 168 wireless access points, the location, expressed in Cartesian coordinates, for the observation (x, y), and some other variables, as timestamps for the measurements or an identifier for the user who took the survey. The ipftest data frame contains \(m\) = 702 observations with the same structure, for testing and validation purposes. The fingerprints included in both data sets where taken in the same building and the same floor. The ipfpwap data frame contains the position of 39 of the WAPs included in the ipftrain and ipftest data sets. The unknown positions of the remaining WAPs are stored as NA. The characteristics of these data sets attributes are:

There are some other publicly available indoor location data sets that have been used to develop and test this package and that are not included for size reasons, as the UJIIndoorLoc Data Set (Torres-Sospedra et al. 2015a) or the Tampere University data set (Cramariuc et al. 2016b).

The theoretical foundations of the algorithms and its uses are discussed in detail in Section 5. A description of the functions ipfKnn, ipfProximity, ipfProbabilistic and ipfEstimate is given while presenting some simulations to show how these algorithms can be useful in practice.

4 Data wrangling

An RSSI fingerprint is a vector composed of signal strength measurements from all the emitters received by a client device at a particular point, and can be measured in any unit of power. It is often expressed in decibels (dBm), or as percentage values between 1-100, and can be a negative or a positive value. Typically this values are stored as negative figures, where the strongest signals are closer to zero.

Some algorithms are sensitive to the scale of the data. For example, Neural Networks generally work better (LeCun et al. 1998) with data scaled to a range between [0, 1] or [\(-1\), 1], since unscaled data may slow down the learning process and the convergence of the network parameters and, in some cases, prevent the network from effectively learning the problem. Thus, the first step before the data can be fed to a positioning algorithm may involve some kind of transformation, depending on the characteristics of the original data.

The data sets included in this package represent the RSSI data from a set of wireless access points as negative integer numbers from \(-99\) (weakest detected signal) to \(-30\) (strongest detected signal). When the RSSI of a WAP is not available, the value used is NA. This convention may be inconvenient for some calculations. For example, a similarity measure between two fingerprints as the euclidean distance will only take into account those WAPs that have been detected in both observations, causing a loss of information that otherwise could be utilized.

The ipft package contains some functions to manipulate and wrangle raw fingerprint data. The ipfTransform function mutates the given fingerprint data into a new data set with a specified range for the RSSI signals. The signature of the function is:

  ipfTransform <- function(data, outRange = c(0, 1), outNoRSSI = 0, inRange = NULL, 
                           inNoRSSI = 0, trans = "scale", alpha = 24)

where:

The scale transformation scales the input data values to a range specified by the user. The feature scaling is performed according to Equation (1):

\[\label{eq:scaleT} \rho_{h,i}^{out} =\begin{cases} a+b\cdot\rho_{h,i}^{in}, &\mathrm{if}\,\rho_{h,i}^{in}\neq inNoRSSI\\ outNoRSSI,&otherwise \end{cases} \tag{1}\]

\[b = \frac{outMin-outMax}{inMin-inMax}\]

\[a = outMin - inMin \cdot b\]

where:

The exponential transformation (Torres-Sospedra et al. 2015b) changes the data according to the next equation:

\[\rho_{h,i}^{out} =\begin{cases} \exp(\frac{pos(\rho_{h,i}^{in})}{\alpha}) , &\mathrm{if}\,\rho_{h,i}^{in}\neq inNoRSSI\\ outNoRSSI,&otherwise \end{cases}\]

\[pos(\rho_{h,i}^{in}) =\begin{cases} \rho_{h,i}^{in} - inMin , &\mathrm{if}\,\rho_{h,i}^{in}\neq inNoRSSI\\ 0,&otherwise \end{cases}\]

where \(\alpha\) is a parameter for the exponential transformation. The authors establish \(\alpha\) as a case-based parameter, and find that 24 is a good value for RSSI fingerprinting data, but they did not study the effects of \(\alpha\) in the transformed data.

The following code scales the ipftrain and ipftest data sets RSSI data, stored in the columns 1:168, to a positive range of values, from 0 to 1, with NA representing a not detected WAP. As a not detected WAP is represented by a NA value in the original data, this has to be indicated to the function so it can transform these values to the desired output:

  trainRSSI <- ipfTransform(ipftrain[, 1:168], outRange = c(0.1, 1), inNoRSSI = NA, 
                            outNoRSSI = NA)
  testRSSI <- ipfTransform(ipftest[, 1:168], outRange = c(0.1, 1), inNoRSSI = NA, 
                           outNoRSSI = NA)

The ipfTransform function returns a new data set with the same structure (vector, matrix or data frame) as the input.

5 Positioning algorithms

This section describes three positioning algorithms implemented in the ipft package. The examples illustrating each description are based on the data previously scaled in Section 4.

The ipfKnn function.

The ipfKnn and ipfEstimate functions implement a version of the knn algorithm to select the \(k\) nearest neighbors (the \(k\) more similar vectors from the training set) to a given RSSI vector. Many different distance metrics (Torres-Sospedra et al. 2015b) can be used to compare two RSSI vectors and measure how ‘near’ or similar they are.

The distance metrics implemented in the package include some typical functions, as the \(L^{1}\) norm, or manhattan distance, or the \(L^{2}\), or euclidean distance. The \(L^{u}\) norm between two fingerprints with indices \(a\) and \(b\) is defined as follows:

\[L^{u} = \left(\sum_{h=1}^{q} \lvert(\rho_{h,a} - \rho_{h,b}\rvert^{u} \right)^{1/u}\]

The package also implements some fingerprinting specific distance estimation functions such as LDG and PLGD. The LGD between two RSSI vectors \(\lambda_{i}^{tr}\) and \(\lambda_{j}^{ts}\) of longitude \(q\) is given by:

\[LGD(\lambda_{i}^{tr}, \lambda_{j}^{ts}) = - \sum_{h = 1} ^ {q} \log \ \max(G(\rho_{h, i}^{tr},\ \rho_{h, j}^{ts}),\ \epsilon)\]

where \(\epsilon\) is a parameter to avoid logarithm of zero, as well as having one beacon RSSI value influence the LGD only above a certain threshold. \(G(\rho_{h, i}^{tr},\ \rho_{h, j}^{ts})\) represents the Gaussian similarity between \(\rho_{h, i}^{tr}\) and \(\rho_{h, j}^{ts}\), defined as

\[\mathrm{G}(\rho_{h, i}^{tr}, \rho_{h, j}^{ts})= \begin{cases} \frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{(\rho_{h, i}^{tr} - \rho_{h, j}^{ts})^{2}}{2\sigma^{2}}\right), &\mathrm{if}\,\rho_{h, i}^{tr}\neq 0\,\mathrm{and}\,\rho_{h, j}^{ts}\neq 0\\ 0,&otherwise \end{cases}\]

The \(\sigma^2\) parameter represents the shadowing variance (Shrestha et al. 2013). Values for \(\sigma\) in the range between 4 and 10 dBm are usually good for indoor scenarios (Lohan et al. 2014).

The PLGD between two RSSI vectors \(\lambda_{i}^{tr}\) and \(\lambda_{j}^{ts}\) of longitude \(q\) is given as:

\[PLGD(\lambda_{i}^{tr}, \lambda_{j}^{ts}) = LGD(\lambda_{i}^{tr}, \lambda_{j}^{ts}) + \alpha(\phi(\lambda_{i}^{tr}, \lambda_{j}^{ts}) + \phi(\lambda_{j}^{ts}, \lambda_{i}^{tr}))\]

where \(\phi(\lambda_{i}^{tr}, \lambda_{j}^{ts})\) is a penalty function for the beacons that are visible in the \(i^{th}\) training fingerprint but not in the \(j^{th}\) test fingerprint, \(\phi(\lambda_{j}^{ts}, \lambda_{i}^{tr})\) is a penalty function for the beacons that are visible in the \(j^{th}\) test fingerprint but not in the \(i^{th}\) training fingerprint, and are defined as follows:

\[\phi(\lambda_{i}^{tr}, \lambda_{j}^{ts}) = \sum_{h=1} ^ {q} T_{max} - \rho_{h, i}^{tr}, \ \mathrm{for} \ \ 0 < \rho_{h, i}^{tr} \leq T_{max} \ \mathrm{and} \ r_i = 0)\]

\[\phi(\lambda_{j}^{ts}, \lambda_{i}^{tr}) = \sum_{h=1} ^ {q} T_{max} - \rho_{h, j}^{ts}, \ \mathrm{for} \ \ 0 < \rho_{h, j}^{ts} \leq T_{max} \ \mathrm{and} \ r_j = 0)\]

\(T_{max}\) is an upper threshold for the strength of the signal, and \(\alpha\) is a scaling factor.

The similarity measurement method can be chosen by means of the parameter method, or by providing a custom function (parameters FUN and ...). The signature of the ipfKnn function is:

  ipfKnn <- function(train_fgp, train_pos, k = 3, method = 'euclidean', 
                     weights = 'distance', norm = 2, sd = 5, epsilon = 1e-3, 
                     alpha = 1, threshold = 20, FUN = NULL, ...)

where:

For a training data set of \(n\) RSSI vectors (a data frame or a matrix named tr_fingerprints) and a data set of \(n\) position vectors (a data frame or a matrix named tr_positions), the code for fitting a knn model with a \(k\) value of 4 and the manhattan distance as the distance measurement method is:

  knnModel <- ipfKnn(tr_fingerprints, tr_positions, k = 4, method = 'manhattan')

This function returns an S3 object of class ipftModel containing the following properties:

To estimate the position of a new fingerprint, the ipfEstimate function makes use of the previously obtained model. An ipfModel object holds the data model needed by the ipfEstimate function to apply the selected algorithm and returns an estimation of the test fingerprints positions. The signature of ipfEstimate is:

  ipfEstimate <- function(ipfmodel, test_fgp, test_pos = NULL)

where:

The ipfEstimate function returns an S3 object of the class ipfEstimation with the following elements:

The following R code shows an example of the usage of the ipfKnn function with the data set included in the package. This example takes the data previously scaled and generates a positioning model from the input data trainRSSI (the radio map) that is stored in knnModel. Then, the model is passed to the ipfEstimate function, along with the test data, to get an estimation of the position of the 702 test observations:

  tr_fingerprints <- trainRSSI[, 1:168]
  tr_positions    <- ipftrain[, 169:170]
  knnModel        <- ipfKnn(tr_fingerprints, tr_positions, k = 7, method = "euclidean")
  ts_fingerprints <- testRSSI[, 1:168]
  ts_positions    <- ipftest[, 169:170]
  knnEstimation   <- ipfEstimate(knnModel, ts_fingerprints, ts_positions)

Since the position of the test observations is known, the mean error for the 702 test observations can be calculated as follows:

  > mean(knnEstimation$errors)
  [1] 3.302739

The mean positioning error is one of the most common evaluation metrics used in indoor positioning (Liu et al. 2007) to assess the system’s accuracy. This metric corresponds to the average Euclidean distance between the estimated locations and the true locations. As positions in the ipftrain and ipftest are expressed in meters, this metric represents the average error in meters for this scenario.

The neighbors selected from the training data set for the 6 first test fingerprints are:

  > head(knnEstimation$neighbors)
       [,1] [,2] [,3] [,4] [,5] [,6] [,7]
  [1,]   71  176  126  125  127  771  130
  [2,]   71  176  126  125  127  771  130
  [3,]  465  914  915  913  217   77  218
  [4,]  465  914  915  176  913  461  217
  [5,]  176  126  125  771  130  127  914
  [6,]   77  914  915  217  176  465  218

where each row of the output corresponds to the indices of the \(k\) = 7 more similar vectors from the training data set to the \(i^{th}\) vector of the test data set.

As an example of how to use ipfKnn with a custom function, the next code shows the definition of a C++ function that implements a modified version of the manhattan distance. The function needs at least two parameters, the two matrices representing the training and test data sets. A third parameter is here introduced to represent a penalization value. This function penalizes the computed distance between two RSSI measurements when one of the beacons is not detected (represented by the value \(\emptyset\)), by multiplying the resulting distance by a factor \(F\). Given two fingerprints \(\lambda_{i}^{tr}\) and \(\lambda_{j}^{ts}\) of length \(q\), the \(myD\) distance is:

\[\mathrm{myD}(\lambda_{i}^{tr}, \lambda_{j}^{ts})= \sum_{h=1}^{q}{\mathrm{myd}(\rho_{h,i}^{tr}, \rho_{h,j}^{ts})},\]

where

\[\mathrm{myd}(\rho_{h,i}^{tr}, \rho_{h,j}^{ts})= \begin{cases} \mathrm{\lvert}\rho_{h,i}^{tr} - \rho_{h,j}^{ts}\rvert , &\mathrm{if}\,\rho_{h,i}^{tr}\neq \emptyset\,\mathrm{and}\,\rho_{h,j}^{ts}\neq \emptyset\\ \mathrm{\lvert}\rho_{h,i}^{tr} - \rho_{h,j}^{ts}\lvert F , &otherwise \end{cases}\]

The following code implements the myD function and shows an example of its usage with ipfKnn, as well as the results obtained. The function is coded in C++ to improve its performance when using large data sets, although the method also accepts custom plain R functions. The myD function assumes that the fingerprints are in a positive range:

  library('ipft')
  library('Rcpp')
  cppFunction('
    NumericMatrix myD(NumericMatrix train, NumericMatrix test, double F = 2.0) {
      NumericMatrix distanceMatrix(test.nrow(), train.nrow());
      double d = 0, pv = 0, rssi1 = 0, rssi2 = 0;
      for (int itrain = 0; itrain < train.nrow(); itrain++) {
        for (int itest = 0; itest < test.nrow(); itest++) {
          d = 0;
          for (int i = 0; i < train.ncol(); i++) {
            rssi1 = R_IsNA(train(itrain, i))? 0 : train(itrain, i);
            rssi2 = R_IsNA(test(itest, i))? 0 : test(itest, i);
            pv = (rssi1 != 0 && rssi2 != 0)? 1 : F;
            d = d + std::abs(rssi1 - rssi2) * pv;
          }
          distanceMatrix(itest, itrain) = d;
        }
      }
      return distanceMatrix;
    }'
  )
  customModel      <- ipfKnn(tr_fingerprints, tr_positions, k = 1, FUN = myD, F = 0.25)
  customEstimation <- ipfEstimate(customModel, ts_fingerprints, ts_positions)

  > head(customEstimation$neighbors)
       [,1]
  [1,]  773
  [2,]  773
  [3,]  776
  [4,]  773
  [5,]  130
  [6,]  130

The previous code outputs the selected neighbors for the first 6 observations in the test data set. As the ts_positions data frame contains the actual location of the observations, the absolute error committed by the model is returned in the ipfEstimation object:

  > head(customEstimation$errors)
  [1] 5.708275 5.708275 5.708275 5.708275 3.380000 3.380000

And the mean error with this custom similarity function is:

  > mean(customEstimation$errors)
  [1] 3.297342

An ipfEstimation object can be used directly to plot the Empirical cumulative distribution function of the error (function ipfPlotEcdf()) and the Probability density function (function ipfPlotPdf()). Figures 1 and 2 show the plots obtained from the following code:

  > ipfPlotEcdf(customEstimation)
  > ipfPlotPdf(customEstimation)

The plotting functions included in the package are described in detail in Section 8.

graphic without alt text
Figure 2: Funtion ipfPlotEcdf. Empirical cumulative distribution function of the error. The plot also shows the mean (red dotted line) and the median (blue dashed line) of the errors.