{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise Session - Support Vector Machine (SVM) SOLUTIONS\n",
"\n",
"Welcome to the 6th exercise session of CS233 - Introduction to Machine Learning. \n",
"\n",
"We will use Scikit-learn, a Python package of machine learning methods, in this exercise. We are going to start with a toy binary classification example to understand Linear SVM, then to address more difficult problem. \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Useful starting lines\n",
"%matplotlib inline\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from mpl_toolkits.mplot3d import Axes3D\n",
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1 Support Vector Machine (SVM)\n",
"SVM tries to solve linear classification problem of the **primal form**: \n",
" \\begin{align}\n",
" \\mathbf{w}^* = \\underset{\\tilde{\\mathbf{w}},\\{\\xi_n\\}}{\\operatorname{min}} \\ \\ & \\frac{1}{2}\\|\\tilde{\\mathbf{w}}\\|^2 + C \\sum^N_{n=1}\\xi_n \\\\\n",
" \\operatorname{subject \\ to} \\ \\ & t_n\\cdot(\\tilde{\\mathbf{w}}\\cdot\\mathbf{x_n}) \\geq 1-\\xi_n , \\forall \\ n \\\\\n",
" &\\text{and }\\ \\xi_n \\geq 0 , \\forall \\ n\n",
" \\end{align}\n",
"where, $\\tilde{\\mathbf{w}}$ are the weights, $x_n$ is a data sample and $t_n$ is a label.\n",
"\n",
"**Q**: Why do we minimize $\\tilde{\\mathbf{w}}$ ? \n",
"\n",
"$\\|\\tilde{\\mathbf{w}}\\|$ is inversely related to margin width, so minimizing it means maximizing the margin, hence we minimize $\\|\\tilde{\\mathbf{w}}\\|$.\n",
"\n",
"**Q**: What is C? How should we choose the best value for C?\n",
"\n",
"C is penalty term, $\\zeta_i$ is error in terms of how far data point is beyond correct margin and $y_i \\in\\{-1,1\\}$ for binary classification. We choose the right value for C, given the data, through validation set.\n",
" \n",
"**Q**: What does it mean when $\\zeta_i \\gt 0$ ?\n",
"\n",
"As our data may not be linearly separable, hence maximizing margin will lead to some misclassifications. $\\zeta_i$ is greater than zero when a data point is beyond margin and how many such data points are allowed is controlled by C. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2 Find the Margins\n",
"\n",
"We learn a hard-margin (i.e., no misclassification allowed) linear SVM classifier for the data samples shown in the figure below. Answer the following questions\n",
"1. What is the equation for the decision boundary, in terms of $x_1$ and $x_2$?\n",
"\n",
"2. How many support vectors are there? Write down their coordinates.\n",
"\n",
"3. What are the weights (including the bias term) for the SVM formulation?\n",
"\n",
"4. If we learn a soft-margin (with slack variables) linear SVM classifier, then what can be the maximum and the minimum number of support vectors?\n",
"\n",
"**Answer**:\n",
"\n",
"1. $x_1-x_2=0$, because the decision boundary has to be perpendicular to the line joining the closest points and also divide it equally. Hence we have $x_1-x_2=0$, passing through (3,3) and the origin.\n",
"\n",
"2. There are two support vectors: (4,2) and (2,4).\n",
"\n",
"3. Let $w_1$, $w_2$ be the two weights and $b$ the bias. We know that (3,3) and (0,0) lie on decision boundary, so they satisfy $\\mathbf{w}^T\\mathbf{x} + b = 0$. That is, \n",
"\\begin{align} \n",
"3w_1+3w_2+b &= 0 \\\\ \n",
"0w_1+0w_2+b &= 0 \n",
"\\end{align} \n",
"which gives \n",
"$w_1=-w_2$ and $b=0$. \n",
"Now, since (4,2) and (2,4) are support vectors, we have \n",
"\\begin{align} \n",
"4w_1+2w_2+b &= 1 \\\\ \n",
"2w_1+4w_2+b &= -1 \n",
"\\end{align} which gives $w_1=\\frac{1}{2}$ and $w_2=-\\frac{1}{2}$.\n",
" \n",
"4. We have a maximum of 4, as all data points can be support vectors when C is very small, and a minimum of 2.\n",
" \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 3 Scikit-Learn\n",
"\n",
"Training a SVM classifer is not a easy task, so in this session, we are going to use Scikit-Learn, which is a machine learning library written in python. Most of the machine learning algorithms and tools are already implemented. In this exercise, we'll use this package to train and understand SVM. If you are interested in how to optimize a SVM, you can refer to [this](https://xavierbourretsicotte.github.io/SVM_implementation.html).\n",
"\n",
"Please install scikit-learn in your conda environment by following instructions at this link:https://scikit-learn.org/stable/install.html if you don't have it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Scikit-Learn has modules implemented broadly for \n",
"- Data Transformations: https://scikit-learn.org/stable/data_transforms.html\n",
"- Model Selection and Training: https://scikit-learn.org/stable/model_selection.html\n",
"- Supervised Techniques: https://scikit-learn.org/stable/supervised_learning.html\n",
"- Unsupervised Techniques: https://scikit-learn.org/stable/unsupervised_learning.html"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All the magic happens under the hood, but gives you freedom to try out more complicated stuff. \n",
"Different methods to be noted here are\n",
"- `fit`: Train a model with the data\n",
"- `predict`: Use the model to predict on test data\n",
"- `score`: Return mean accuracy on the given test data\n",
"\n",
"Have a look at [this](https://scikit-learn.org/stable/tutorial/basic/tutorial.html#learning-and-predicting) for simple example.\n",
"\n",
"We will explore linear SVM in this session: [SVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC) with linear kernel. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 4 Binary Classification\n",
"\n",
"Let's begin with a simple **binary** classification using Linear SVM.\n",
"The data is simply **linearly** separable."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Simple data\n",
"from plots import plot_simple_data\n",
"x = np.array([[2,4],[1,4],[2,3],[6,-1],[7,-1],[5,-3]] )\n",
"y = np.array([-1,-1, -1, 1, 1 , 1 ])\n",
"plot_simple_data()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4.1 Linear SVM\n",
"In this part, you are asked to build a SVM classifier using SVC and to understand the outputs from the fitted model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Let use SVC with linear kernel\n",
"from sklearn.svm import SVC\n",
"from plots import plot\n",
"\n",
"## CODE HERE\n",
"# 1. Declare a SVC with C=1.0 and kernel='linear'\n",
"clf = SVC(C = 1.0, kernel = 'linear')\n",
"\n",
"# 2. use x and y to fit the model\n",
"clf.fit(x, y) \n",
"\n",
"# 3. show the fitted model\n",
"plot(x, y, clf)\n",
"\n",
"print('w = ',clf.coef_)\n",
"print('w0 = ',clf.intercept_)\n",
"print('Number of support vectors for each class = ', clf.n_support_)\n",
"print('Support vectors = ', clf.support_vectors_)\n",
"print('Indices of support vectors = ', clf.support_)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this case, we found that we have 2 **support vectors**, one in each class. They are shown highlighed in green in the plot. A support vector is a data sample that is either on the margin or within the margin. \n",
"\n",
"Let's inspect the result of the classification. We do the classification in the following way:\n",
"\n",
"$$ \n",
"t_i = \\begin{cases}\n",
"-1 & \\mathbf{x}_i^T \\mathbf{w} + w_0 < 0\\\\\n",
"1 & \\text{otherwise}\n",
"\\end{cases}\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Use the weights (w) from the fitted model to predict the labels of input data points\n",
"\n",
"def raw_predict(x, w, w0):\n",
" '''\n",
" given input data, w and w0, output the prediction result\n",
" \n",
" input:\n",
" x: data, np.array of shape (N, D) where N is the number of datapoints and D is the dimension of features.\n",
" w: weights, np.array of shape (D,)\n",
" w0: bias, np.array of shape (1,)\n",
" \n",
" output:\n",
" out: predictions, np.array of shape (N, ). tip: .astype(int) \n",
" '''\n",
" ## CODE HERE\n",
" out = np.sign(np.dot(x, w) + w0).astype(int)\n",
" return out\n",
"\n",
"x_test = np.array([\n",
" [4, 2],\n",
" [ 6, -3]])\n",
"\n",
"raw_pred = raw_predict(x_test, clf.coef_[0], clf.intercept_)\n",
"print(\"Prediction from your implementation: \", raw_pred)\n",
"\n",
"##CODE HERE. Use scikit-learn's predict function to do the prediction on the test data.\n",
"model_predict = clf.predict(x_test)\n",
"\n",
"print(\"Prediction from the model: \", model_predict)\n",
"\n",
"\n",
"assert(raw_pred.all() == clf.predict(x_test).all())\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let us determine the indices of the support vectors. (Reminder: These are the data samples that fall on the margin or within the margin). "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## we can also calculate the decision function manually\n",
"## Step 1: CODE HERE:: Code the decision function: Xw+w_0\n",
"decision_function = np.dot(x, clf.coef_[0]) + clf.intercept_\n",
"\n",
"## Step 2: We can also retrieve the decision function from the model:\n",
"decision_function_from_model = clf.decision_function(x)\n",
"\n",
"assert(decision_function_from_model.all() == decision_function.all())\n",
"\n",
"## according to the condition that support vectors should satisfy\n",
"## CODE HERE tips: use np.where to put the condition in.\n",
"support_vector_indices = np.where(y * decision_function <= 1.)[0]\n",
"\n",
"\n",
"print('I find the indices of support vectors = ', support_vector_indices)\n",
"assert(support_vector_indices.all() == clf.support_.all())\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4.2 Different C values\n",
"Let's try different values of C. In the code, vary the C value from 0.001 to 100 and notice the changes on a bigger dataset. \n",
"**Question**: How does the margin vary with C? **Hint**: have a look at the optimization formulation above."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Answer**: Lower C allows more misclassification and hence larger margin, while bigger C reduces misclassfication and hence smaller margin."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.svm import SVC\n",
"from helpers import get_simple_dataset\n",
"from plots import plot\n",
"\n",
"# Get the simple dataset\n",
"X, Y = get_simple_dataset()\n",
"plot(X,Y,None,dataOnly=True)\n",
"\n",
"#Declare a SVM model with linear kernel and C=1.0\n",
"clf = SVC(kernel='linear', C=1.0)\n",
"\n",
"#call the fit method\n",
"clf.fit(X, Y)\n",
"\n",
"#plot the decision boundary\n",
"plot(X, Y, clf)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above plot shows the decision boundary and margins of the learnt model. Encircled points are the support vectors. \n",
"WARNING: if the margins go beyound the limits of axis, they are not shown or shown close to decision plane. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# CODE HERE #\n",
"# Vary C and plot the boundaries\n",
"# Use np.logspace to generate 6 C values of range (10e-3, 10e2) \n",
"\n",
"C = np.logspace(-3,2,6)\n",
"for c in C:\n",
" #Get SVM model with linear kernel \n",
" clf = SVC(kernel='linear', C=c)\n",
"\n",
" #call the fit method\n",
" clf.fit(X, Y)\n",
"\n",
" #plot the decision boundary\n",
" plot(X, Y, clf)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Additional Reading (if interested)\n",
"- Multiclass SVM (Bishop- Multiclass SVMs 7.1.3)\n",
"- Can we have probabilistic interpretation of SVM? (Bishop- Relevance Support Machine 7.2)"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}