What package is KKNN in R?

I try to install the kknn package in R. It downloads the folder but whenever I use library(kknn) I get:

Error in library(kknn) : there is no package called ‘kknn’

install.packages("kknn") Installing package into ‘C:/Users/Ed/Documents/R/win-library/3.2’ (as ‘lib’ is unspecified) trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.2/kknn_1.3.0.zip' Content type 'application/zip' length 321517 bytes (313 KB) downloaded 313 KB

package ‘kknn’ successfully unpacked and MD5 sums checked Warning in install.packages : cannot remove prior installation of package ‘kknn’

The downloaded binary packages are in C:\Users\Ed\AppData\Local\Temp\RtmpWClNdR\downloaded_packages

library(kknn) Error in library(kknn) : there is no package called ‘kknn’

Any suggestions?

Many thanks

Ed

1

This report aims to present the capabilities of the package kknn.

The document is a part of the paper “Landscape of R packages for eXplainable Machine Learning”, S. Maksymiuk, A. Gosiewska, and P. Biecek. (https://arxiv.org/abs/2009.13248). It contains a real life use-case with a hand of titanic_imputed data set described in Section Example gallery for XAI packages of the article.

We did our best to show the entire range of the implemented explanations. Please note that the examples may be incomplete. If you think something is missing, feel free to make a pull request at the GitHub repository MI2DataLab/XAI-tools.

The list of use-cases for all packages included in the article is here.

Load titanic_imputed data set.

data(titanic_imputed, package = "DALEX") titanic_imputed$survived <- as.factor(titanic_imputed$survived) head(titanic_imputed) ## gender age class embarked fare sibsp parch survived ## 1 male 42 3rd Southampton 7.11 0 0 0 ## 2 male 13 3rd Southampton 20.05 0 2 0 ## 3 male 16 3rd Southampton 20.05 1 1 0 ## 4 female 39 3rd Southampton 20.05 1 1 1 ## 5 female 16 3rd Southampton 7.13 0 0 1 ## 6 male 25 3rd Southampton 7.13 0 0 1 library("kknn")

Fit a rules type model to the titanic imputed data.

model <- kknn(survived~., titanic_imputed, titanic_imputed)

model$C[1:15,] ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] ## [1,] 1 576 1125 1090 336 460 1060 ## [2,] 2 458 297 620 907 830 316 ## [3,] 3 1036 675 1225 276 1226 614 ## [4,] 4 461 1239 1172 299 668 552 ## [5,] 5 532 881 974 1244 80 1117 ## [6,] 6 53 1192 626 17 320 1211 ## [7,] 7 322 863 767 928 986 989 ## [8,] 8 363 364 321 864 606 764 ## [9,] 9 495 70 1317 1056 290 1101 ## [10,] 10 590 42 602 1243 750 742 ## [11,] 11 327 535 637 1167 1030 271 ## [12,] 12 1170 269 994 982 302 627 ## [13,] 13 324 501 313 734 1289 1202 ## [14,] 14 830 907 949 275 614 1226 ## [15,] 15 281 831 552 1171 551 668

# we round the values so they will print in a nice way round(model$D[1:15,], 6) ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] ## [1,] 0 0.000462 0.001410 0.001410 0.023551 0.082288 0.082288 ## [2,] 0 0.329107 1.051237 1.263226 1.531643 1.536469 1.548023 ## [3,] 0 0.164553 0.339802 0.512407 0.586802 0.592083 1.009073 ## [4,] 0 0.658211 0.746164 0.854278 0.915470 1.070538 1.410831 ## [5,] 0 0.001168 0.164553 0.164553 0.164555 0.165802 0.166291 ## [6,] 0 0.000476 0.000487 0.000926 0.000944 0.000949 0.000949 ## [7,] 0 0.109655 0.215234 1.209828 1.221000 1.223321 1.235177 ## [8,] 0 0.263171 0.299278 0.336994 1.160193 1.285896 1.510510 ## [9,] 0 0.256452 0.256454 0.256454 0.256454 0.269327 0.269329 ## [10,] 0 0.000219 0.000242 0.000242 0.000242 0.001630 0.003015 ## [11,] 0 0.002795 0.002795 0.002795 0.003015 0.022159 0.022159 ## [12,] 0 0.002777 0.019364 0.019364 0.019364 0.019364 0.019387 ## [13,] 0 0.377252 0.426828 0.518923 0.518923 0.577952 0.597732 ## [14,] 0 0.512139 0.686232 1.196193 1.213776 1.221266 1.373515 ## [15,] 0 0.229811 0.907721 1.238800 1.316631 1.318218 1.394238

Add the following code to your website.

REMOVE THIS

For more information on customizing the embed code, read Embedding Snippets.

kknn (version 1.3.1)

Performs k-nearest neighbor classification of a test set using a training set. For each row of the test set, the k nearest training set vectors (according to Minkowski distance) are found, and the classification is done via the maximum of summed kernel densities. In addition even ordinal and continuous variables can be predicted.

kknn(formula = formula(train), train, test, na.action = na.omit(), k = 7, distance = 2, kernel = "optimal", ykernel = NULL, scale=TRUE, contrasts = c('unordered' = "contr.dummy", ordered = "contr.ordinal")) kknn.dist(learn, valid, k = 10, distance = 2)

Matrix or data frame of training set cases.

Matrix or data frame of test set cases.

Matrix or data frame of training set cases.

Matrix or data frame of test set cases.

A function which indicates what should happen when the data contain 'NA's.

Number of neighbors considered.

Parameter of Minkowski distance.

Kernel to use. Possible choices are "rectangular" (which is standard unweighted knn), "triangular", "epanechnikov" (or beta(2,2)), "biweight" (or beta(3,3)), "triweight" (or beta(4,4)), "cos", "inv", "gaussian", "rank" and "optimal".

Window width of an y-kernel, especially for prediction of ordinal classes.

logical, scale variable to have equal sd.

A vector containing the 'unordered' and 'ordered' contrasts to use.

kknn returns a list-object of class kknn including the components fitted.valuesVector of predictions.CLMatrix of classes of the k nearest neighbors.WMatrix of weights of the k nearest neighbors.DMatrix of distances of the k nearest neighbors.CMatrix of indices of the k nearest neighbors.probMatrix of predicted class probabilities.responseType of response variable, one of continuous, nominal or ordinal.distanceParameter of Minkowski distance.callThe matched call.termsThe 'terms' object used.

This nearest neighbor method expands knn in several directions. First it can be used not only for classification, but also for regression and ordinal classification. Second it uses kernel functions to weight the neighbors according to their distances. In fact, not only kernel functions but every monotonic decreasing function $f(x) for all x>0$ will work fine.

The number of neighbours used for the "optimal" kernel should be $ [ (2(d+4)/(d+2))^(d/(d+4)) k ]$, where k is the number that would be used for unweighted knn classification, i.e. kernel="rectangular". This factor $(2(d+4)/(d+2))^(d/(d+4))$ is between 1.2 and 2 (see Samworth (2012) for more details).

train.kknn, simulation, knn and knn1

Run this code

library(kknn) data(iris) m <- dim(iris)[1] val <- sample(1:m, size = round(m/3), replace = FALSE, prob = rep(1/m, m)) iris.learn <- iris[-val,] iris.valid <- iris[val,] iris.kknn <- kknn(Species~., iris.learn, iris.valid, distance = 1, kernel = "triangular") summary(iris.kknn) fit <- fitted(iris.kknn) table(iris.valid$Species, fit) pcol <- as.character(as.numeric(iris.valid$Species)) pairs(iris.valid[1:4], pch = pcol, col = c("green3", "red") [(iris.valid$Species != fit)+1]) data(ionosphere) ionosphere.learn <- ionosphere[1:200,] ionosphere.valid <- ionosphere[-c(1:200),] fit.kknn <- kknn(class ~ ., ionosphere.learn, ionosphere.valid) table(ionosphere.valid$class, fit.kknn$fit) (fit.train1 <- train.kknn(class ~ ., ionosphere.learn, kmax = 15, kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 1)) table(predict(fit.train1, ionosphere.valid), ionosphere.valid$class) (fit.train2 <- train.kknn(class ~ ., ionosphere.learn, kmax = 15, kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 2)) table(predict(fit.train2, ionosphere.valid), ionosphere.valid$class)

Run the code above in your browser using DataCamp Workspace