{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# [TC1] Lab exercise 1: SVM training\n",
"\n",
"In this lab exercise you will code 3 different algorithms to train a binary SVM classifier:\n",
"\n",
"1. sub-gradient descent on the primal\n",
"2. projected gradient ascent on the dual\n",
"3. box constrained coordinate ascent on the dual\n",
"\n",
"We will use a toy dataset generated via scikit learn, therefore it is easy to achieve 100% accuracy even if your algorithm is wrong! You should be extremely careful with you implementation, you need to check yourself that you precisely understand what you are coding. :)\n",
"\n",
"Contrary to the course, here I ask you to have a hyperparameter to weight the regularization term.\n",
"You need to redo the math to check what this update changes in the primal and dual problem optimization algorithms.\n",
"If you don't want to mess with the math, you can first implement everything with the formula of the course, and then update to take into account the regularization term!\n",
"\n",
"You need to submit both this completed notebook and a report.\n",
"See the course webpage for more information.\n",
"\n",
"## Tensor operations\n",
"\n",
"You **must** learn to code using tensor operation only.\n",
"In other word, you **should not** use for loops in your code when compute things.\n",
"The only for loops you need are already written!\n",
"\n",
"The reason is that it makes everything simpler to understand and that Numpy (or other libraries) can vectorize the operations on CPU (or GPU in the case of libraries like Pytorch) to accelarate computation.\n",
"\n",
"https://en.wikipedia.org/wiki/Array_programming"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preliminaries\n",
"\n",
"You need to install the following libraries: numpy, scikit-learn and matplotlib.\n",
"Please refer to the internet for instruction if you don't know how to do that."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import sklearn.datasets\n",
"\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Numpy is one of the most popular numerical computation library in Python.\n",
"For this lab exercise we are mainly interested in tensor computation.\n",
"\n",
"It is really important that you take time to understand how Numpy works. A short tutorial is available here: https://cs231n.github.io/python-numpy-tutorial/\n",
"\n",
"Take time to do a few test, understand the different operation, the different between in-place and out-of-place operations, etc.\n",
"The most important resource you **must** use is the numpy documentation.\n",
"As we usually say in computer science: Read The F*cking Manual https://numpy.org/doc/stable/reference/index.html"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# create a 2D tensor of shape (2, 5) full of zeros\n",
"# by default the tensor will contain elements of type float\n",
"t = np.zeros((2, 5))\n",
"print(\"Shape of the tensor: \", t.shape)\n",
"print(\"Content:\")\n",
"print(t)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# You can reshape tensors\n",
"# When you reshape a tensor, it does not change the data order in the underlying memory.\n",
"# By default, this is the \"C array order\", also called the row-major format.\n",
"# If you don't know about this, check the wikipedia page:\n",
"# https://en.wikipedia.org/wiki/Row-_and_column-major_order\n",
"\n",
"# for example, the following operation will reshape t as a vector with ten elements\n",
"t = t.reshape(-1) # -1 means \"put every there\"\n",
"print(t.shape)\n",
"\n",
"# here instead of having a vector we build a tensor with a single row with ten elements\n",
"t = t.reshape(1, -1)\n",
"# of cours we could have done t = t.reshape(1, 10)\n",
"print(t.shape)\n",
"\n",
"# here a tensor with a single column with ten elements\n",
"t = t.reshape(-1, 1)\n",
"# of cours we could have done t = t.reshape(10, 1)\n",
"print(t.shape)\n",
"\n",
"# reshape into the original shape\n",
"t = t.reshape(2, -1)\n",
"print(t.shape)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# this creates a vector with values from 0 to 3 (not included)\n",
"t = np.arange(4)\n",
"print(t)\n",
"\n",
"# reshape\n",
"t = t.reshape((2, 2))\n",
"print(t)\n",
"\n",
"# .T returns the transposed tensor\n",
"print(t.T)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# set the first element of the second row to one and display the new data\n",
"# this is an in-place operation: it directly modifes the tensor memory\n",
"t[1, 0] = 1.\n",
"print(\"New content:\")\n",
"print(t)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# multiply the content of the tensor by two\n",
"# this is an out-of-place operation: it does not modify the tensory memory but creates a new one\n",
"t2 = 2 * t\n",
"\n",
"print(\"Original tensor:\")\n",
"print(t)\n",
"print(\"New tensor:\")\n",
"print(t2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# do the same thing but in-place\n",
"t *= 2\n",
"print(\"New content:\")\n",
"print(t)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# There are two multiplication operators:\n",
"# * is the element wise multiplication operator (also called the Hadamard product)\n",
"# @ is the matrix multiplication operator\n",
"\n",
"a = np.arange(4).reshape(2, 2)\n",
"# one_like create a tensor with the same properties (i.e. type and shape) than the argument\n",
"# but filled with ones\n",
"b = 2 * np.ones_like(a) \n",
"\n",
"print(\"a:\")\n",
"print(a)\n",
"print(\"b:\")\n",
"print(b)\n",
"print()\n",
"\n",
"# element wise multiplication\n",
"c = a * b\n",
"print(\"a * b:\")\n",
"print(c)\n",
"print()\n",
"\n",
"# matrix multiplication\n",
"c = a @ b\n",
"print(\"a @ b:\")\n",
"print(c)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# you can easily retrieve one row or one column of a tensor\n",
"print(\"a:\")\n",
"print(a)\n",
"print()\n",
"\n",
"print(\"first row of a:\")\n",
"print(a[0])\n",
"print()\n",
"\n",
"print(\"first column of a:\")\n",
"print(a[:, 0])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# the same approach can be used to update the data in-place\n",
"print(\"a:\")\n",
"print(a)\n",
"print()\n",
"\n",
"# set the second colums elements to 10\n",
"a[:, 1] = 10.\n",
"print(\"after update:\")\n",
"print(a)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One of the most important feature you have to understand is **broadcasting**.\n",
"You can read the following article to understand operation broadcasting: https://numpy.org/devdocs/user/theory.broadcasting.html\n",
"\n",
"It is a very important concept that is really helpful in numpy and other numerical computation library.\n",
"The documentation often explain of broadcasting is implemented for a given operation, check for example the matrix multiplication page: https://numpy.org/doc/stable/reference/generated/numpy.matmul.html"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.arange(6).reshape(2, -1)\n",
"print(\"a: \")\n",
"print(a)\n",
"print()\n",
"\n",
"# we will multpliy the first row by 2 and the second row by 4 by using operation broadcasting\n",
"# np.array can be used to create a tensor from python data\n",
"b = np.array([2, 4]).reshape((2, 1))\n",
"c = a * b\n",
"\n",
"print(\"new tensor:\")\n",
"print(c)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotting\n",
"\n",
"We will use the following function to plot the data.\n",
"The first argument is mandatory and the two last are optional.\n",
"\n",
"- x: data to plot. It must be of shape (number of datapoints, number of dimensions). Only the two first dimension will be used for horizontal and vertical dimensions, respectively. This means you can use the third dimension for the bias term!\n",
"- y: labels. It must be of shape (number of datapoints,), that is a single dimension vector. Value must be either -1 (displayed in blue) or +1 (displayed in red)\n",
"- a: parameters of a linear binary classifiers. Should be of shape (3,), that is a vector with three elements. The last element is the bias term! This argument is used to display the decision boundary (the hyperplane that separates the two classes)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# plot only the two first features as the two first dim\n",
"# other features will be ignored, so we can use them for the bias term for example\n",
"def plot(x, y=None, a=None):\n",
" data_colors = np.array([\"blue\", \"red\"])\n",
" if y is None:\n",
" c = None\n",
" else:\n",
" c = data_colors[((y + 1) // 2).astype(int)]\n",
" \n",
" plt.scatter(x[:, 0], x[:, 1], c=c)\n",
" \n",
" if a is not None:\n",
" left = np.min(x[:, 0] - 1)\n",
" right = np.max(x[:, 0] + 1)\n",
"\n",
" left_y = -(a[2] + a[0]*left) / a[1]\n",
" right_y = -(a[2] + a[0]*right) / a[1]\n",
"\n",
" plt.plot([left, right], [left_y, right_y], c='black')\n",
" \n",
" axes = plt.gca()\n",
" axes.set_xlim([np.min(x[:, 0]) - 1,np.max(x[:, 0]) + 1])\n",
" axes.set_ylim([np.min(x[:, 1]) - 1,np.max(x[:, 1]) + 1])\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data generation\n",
"\n",
"We use scikit-learn to generate a toy dataset.\n",
"After you completed the lab exercise, you should play with dataset generation to generate non-separable data or even other kind of dataset, check the scikit-learn documentation! https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_no_bias, y_gold = sklearn.datasets.make_classification(\n",
" n_samples=200,\n",
" n_features=2,\n",
" n_classes=2,\n",
" n_informative=2,\n",
" n_redundant=0,\n",
" n_repeated=0,\n",
" class_sep=3. # default=1\n",
")\n",
"\n",
"# replace 0/1 labels with -1/1 labels\n",
"y_gold = y_gold * 2 - 1\n",
"\n",
"# add a third feature which will be used for the bias term\n",
"X = np.empty((X_no_bias.shape[0], 3))\n",
"X[:,:2] = X_no_bias\n",
"X[:,2] = 1.\n",
"\n",
"# create a diagonal vector containing labels\n",
"# (see the course to understand why this is useful)\n",
"Y_gold = np.diag(y_gold)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# so we have two different tensors containing the gold labels,\n",
"# use the most useful ones!\n",
"print(X.shape)\n",
"print(y_gold.shape)\n",
"print(Y_gold.shape)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# plot the data\n",
"plot(X, y_gold)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# First step!\n",
"\n",
"The first two functions to code are:\n",
"- predict(a, X): returns a vector of shape X.shape[0] containing the prediction given by the linear model parameterized by vector a, i.e. values in {-1, 1} for each datapoint (i.e. this function does both the scoring and prediction step)\n",
"- accuracy(gold, predicted): return the % of equal elements in the two vectors"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# you can use np.sign here\n",
"def predict(a, X):\n",
" # TODO\n",
" return ...\n",
"\n",
"# you can use == to compare values between two tensors.\n",
"# Even if == returns a vector of boolean, you can sum them via t.sum() or np.sum(t)\n",
"# to count the number of True values\n",
"def accuracy(gold, predicted):\n",
" # TODO\n",
" return ..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# create a randomly initialized model\n",
"a = np.random.normal(loc=0.0, scale=0.1, size=3)\n",
"\n",
"# compute the accuracy of the random model,\n",
"# it should return a value around 50,\n",
"# but the randomly generated model can also be way better/way worse\n",
"# (if you execute this cell many times you may even obtain an accuracy of 100%!)\n",
"accuracy(y_gold, predict(a, X))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Training algorithm 1: sub-gradient descent on the primal problem\n",
"\n",
"The objective function is defined as follows:\n",
"$$ \\sum_i \\max\\left(0, 1 - X_{i} a y_i\\right) + \\frac{c}{2} \\| a\\|^2_2$$\n",
"where c is the regularization weight.\n",
"\n",
"To implement this first you must first understand the following:\n",
"\n",
"1. what is the gradient of the objective (there are two different cases depending of the result of the max!)\n",
"2. how to use tensor operations only for computing the objective and its gradient?\n",
"\n",
"**Hint:**\n",
"for the second point, you will need to construct a mask, i.e. a vector containing values 0 and 1.\n",
"It needs to be computed as \"mask = (right >= 0).astype(float)\" where \"right\" is the right argument of the max.\n",
"Can you understand why? Can you explain why?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# hyperparameters: you can play with them!\n",
"step_size = 1e-6\n",
"regularization_weight = 1\n",
"n_epochs = 1000\n",
"\n",
"# initialize the model randomly\n",
"a = np.random.normal(loc=0.0, scale=0.1, size=3)\n",
"\n",
"# append to this list the objective value (or loss) at each epoch\n",
"objective_per_epoch = list()\n",
"# append to this list the accuracy at each epoch\n",
"# (evaluate on train directly, we are just playing with the optimization algorithm here)\n",
"# (but this is a bad practice, you should use development data in a real setting)\n",
"accuracy_per_epoch = list()\n",
"\n",
"for _ in range(n_epochs):\n",
" # compute the objective value (and the mask!)\n",
" # TODO\n",
" \n",
" \n",
" obj = ...\n",
" objective_per_epoch.append(obj)\n",
" \n",
" # compute the sub-gradient\n",
" # TODO\n",
" \n",
" # update the parameters (you should use an in place operation)\n",
" # TODO\n",
" \n",
" # compute the accuracy after the update\n",
" accuracy_per_epoch.append(accuracy(y_gold, predict(a, X)))\n",
"\n",
"print(\"Accuracy at the last epoch: \", accuracy_per_epoch[-1])\n",
"print()\n",
"\n",
"# display two plots:\n",
"# in red the objective value at each epoch,\n",
"# in blue accuracy at each epoch\n",
"\n",
"print(\"Objective function:\")\n",
"plt.plot(objective_per_epoch, color=\"red\")\n",
"plt.show()\n",
"\n",
"\n",
"print(\"Classification accuracy:\")\n",
"plt.plot(accuracy_per_epoch, color=\"blue\")\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Training algorithm 2: projected gradient ascent on the dual problem\n",
"\n",
"In this second exercise, you need to optimize the dual via projected gradent ascent:\n",
"\n",
"1. what is the dual objective when we add a hyperparameter to control the weight of the regularization term?\n",
"2. how do we retrieve primal variables from dual variables in this case?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def dual_vars_to_primal_vars(dual_vars, X, Y_gold, regularization_weight):\n",
" return # TODO"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# hyperparameters: you can play with them!\n",
"step_size = 1e-4\n",
"regularization_weight = 1\n",
"n_epochs = 1000\n",
"\n",
"# dual varariables initialization\n",
"# you can initialize them differently,\n",
"# only remember that you need each value to be between -1 and 0\n",
"dual_vars = np.zeros(X.shape[0])\n",
"\n",
"objective_per_epoch = list()\n",
"accuracy_per_epoch = list()\n",
"support_vectors = list()\n",
"\n",
"\n",
"Y = np.diag(y_gold)\n",
"for _ in range(n_epochs):\n",
" # compute the object\n",
" # TODO\n",
" obj = ...\n",
" objective_per_epoch.append(obj)\n",
" \n",
" # compute gradient\n",
" # TODO\n",
"\n",
" # projected sub-gradient step\n",
" # you can use np.clip to project the new dual variables in the feasible space\n",
" # TODO\n",
" \n",
" # evaluate\n",
" a = dual_vars_to_primal_vars(dual_vars, X, Y_gold, regularization_weight)\n",
" accuracy_per_epoch.append(accuracy(y_gold, predict(a, X)))\n",
" \n",
" # compute the number of support vectors,\n",
" # i.e. the number of dual vars != 0\n",
" support_vectors.append(((dual_vars ** 2) > 1e-7).sum())\n",
"\n",
" \n",
"print(\"Stats at the last epoch:\")\n",
"print(\" - Accuracy: \", accuracy_per_epoch[-1])\n",
"print(\" - Number of support vectors: \", support_vectors[-1])\n",
"print()\n",
"\n",
"\n",
"print(\"Objective function:\")\n",
"plt.plot(objective_per_epoch, color=\"red\")\n",
"plt.show()\n",
"\n",
"\n",
"print(\"Classification accuracy:\")\n",
"plt.plot(accuracy_per_epoch, color=\"blue\")\n",
"plt.show()\n",
"\n",
"print(\"Number of support vectors:\")\n",
"plt.plot(support_vectors, color=\"green\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"# This plot will show the support vectors in red together with the separating hyperplane,\n",
"# i.e. the blue points have a dual variable = 0\n",
"\n",
"plot(X, ((dual_vars ** 2) > 1e-7)*2 - 1, dual_vars_to_primal_vars(dual_vars, X, Y_gold, regularization_weight))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"# this plot use the red/blue colors to indicate the gold class,\n",
"# you visualize if your model separating hyperplace correctly separates the data :)\n",
"plot(X, y_gold, dual_vars_to_primal_vars(dual_vars, X, Y_gold, regularization_weight))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Training algorithm 3: box constrained coordinate ascent on the dual problem\n",
"\n",
"This last algorithm does not require a stepsize.\n",
"You will need a mask to compute the numerator in the dual variable update rule! (the sum does not include the update dual var)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"# hyperparameters: you can play with them!\n",
"regularization_weight = 1\n",
"n_epochs = 1000\n",
"\n",
"# initialize dual variables\n",
"dual_vars = np.zeros(X.shape[0])\n",
"\n",
"objective_per_epoch = list()\n",
"accuracy_per_epoch = list()\n",
"support_vectors = list()\n",
"\n",
"for _ in range(n_epochs):\n",
" # compute objective\n",
" obj = ...\n",
" objective_per_epoch.append(obj)\n",
" \n",
" # for each coordinate\n",
" for k in range(X.shape[0]):\n",
" # update the coordinate k (remember there are constraints on the dual vars!)\n",
" # TODO\n",
"\n",
" # compute accuracy\n",
" a = dual_vars_to_primal_vars(dual_vars, X, Y_gold, regularization_weight)\n",
" accuracy_per_epoch.append(accuracy(y_gold, predict(a, X)))\n",
" \n",
" # compute the number of support vectors,\n",
" # i.e. the number of dual vars != 0\n",
" support_vectors.append(((dual_vars ** 2) > 1e-7).sum())\n",
"\n",
" \n",
"print(\"Stats at the last epoch:\")\n",
"print(\" - Accuracy: \", accuracy_per_epoch[-1])\n",
"print(\" - Number of support vectors: \", support_vectors[-1])\n",
"print()\n",
"\n",
"print(\"Objective function:\")\n",
"plt.plot(objective_per_epoch, color=\"red\")\n",
"plt.show()\n",
"\n",
"\n",
"print(\"Classification accuracy:\")\n",
"plt.plot(accuracy_per_epoch, color=\"blue\")\n",
"plt.show()\n",
"\n",
"print(\"Number of support vectors:\")\n",
"plt.plot(support_vectors, color=\"green\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"# This plot will show the support vectors in red together with the separating hyperplane,\n",
"# i.e. the blue points have a dual variable = 0\n",
"\n",
"plot(X, ((dual_vars ** 2) > 1e-7)*2 - 1, dual_vars_to_primal_vars(dual_vars, X, Y_gold, regularization_weight))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"# this plot use the red/blue colors to indicate the gold class,\n",
"# you visualize if your model separating hyperplace correctly separates the data :)\n",
"\n",
"plot(X, y_gold, dual_vars_to_primal_vars(dual_vars, X, Y_gold, regularization_weight))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}