Plotting decision boundary for SVMs in R


As i was not able to find some direct working code for painting a nice decision boundary around for a trained SVM, i spend two hours to hack one. So let us prepare some data first, where we use a probably stupid function, that samples some points from a given 2D gaussian and returns a data frame. yes, the routine is suboptimal, but for my purposes it works:
clusterAt <- function (m = c(0,0), s = c(0.1, 0.1), n = 5, l = 1) {
  if (length(s) == 1) {
    s = c(s, s)
  }
  
  if (length(s) == 2) {
      s = matrix(c(s[1], 0, 0, s[2]), nrow = 2)
  }
  
  s = matrix(s, nrow = 2)
  
  p = mvrnorm (n, m, Sigma = s)
  xc = p[,1]
  yc = p[,2]
  lc = rep(l, n) 
  df = data.frame(x=xc, y=yc, l=lc)
}
We can create now our data, which consists of two gaussians in the upper right and lower left corner:
set.seed(42)
clusters = rbind(
  clusterAt( m = c(4, 4), s = c(25.5,24.0), n = 70, l = 1),
  clusterAt( m = c(-4,-4), s = c(24.0, 23.5), n = 70, l = 0)
)
Next we need to train an SVM on this data-- its decision boundary is what we want to plot. We will use the well-known e1071 library. The C and g values are the parameters of the SVM to be trained.
library(e1071)
C = 1
g = 1
dat = data.frame(y = factor(clusters$l),  clusters[,1:2])
colnames(dat) = c("l", "x", "y")
fit = svm(factor(l) ~ ., data = dat, scale = FALSE,  probability = TRUE,
	gamma = g, kernel = "radial", cost = C)
Now to plot the decision boundary, we predict on a very find grid. We make the SVM to give back decision values instead of labels. As we know the decision boundary is located just at zero, so if we get ggplot to plot the coutour along the level 0, we will see the decision boundary we're after. So let us first create the grid and predict on it
# create a (fine) grid
gx = seq(-7.5, 7.5, 0.1)
gy = seq(-7.5, 7.5, 0.1)
datagrid  = expand.grid(x = gx, y = gy)

# predict on it and add prediction to the grid
pred = predict(fit, datagrid, decision.values = TRUE)
pred = attributes(pred)$decision.values
datagrid = cbind (datagrid, z = as.vector(pred))
Next is to plot all the things-- first the data itself (using my minimalistic style..)
# now plot data 
P = ggplot(data=clusters, aes(x=x, y=y, color=3-l)) +   
  geom_point(size = 4) + 
  scale_colour_gradientn(colours=cols) + 
  new_theme_empty + xlim(-7.5, 7.5) + ylim(-7.5, 7.5)
and then we just add our decision boundary
# add our decision boundary to it
P = P + geom_contour( data=datagrid, aes(x=x, y=y, z=z), col = "black", size = 4, breaks=c(0) )
print (P)
Yes, that was not *THAT* complicated, but somehow google was not able to point me out to do this in a ggplot fashion, i found a lot of standard plot links, and as a non-ggplot expert was a bit puzzled how to make it work with ggplot. i've added a small function you can use to plot our data too, download it from my github repository (together with some corrections for ggplot2):



probably i'll add this also to the SVMBridge.

But be aware that there is no guarantee that your decision boundary is connected, so check the output, probably twice.

(credits for part of the source: http://stackoverflow.com/questions/24260576/plot-decision-boundaries-with-ggplot2)