{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise 09: Convolutional Neural Networks - Solution\n", "$\\renewcommand{\\real}{\\mathbb{R}}$\n", "$\\renewcommand{\\xb}{\\mathbf{x}}$\n", "$\\renewcommand{\\yb}{\\mathbf{y}}$\n", "$\\renewcommand{\\zb}{\\mathbf{z}}$\n", "$\\renewcommand{\\wb}{\\mathbf{w}}$\n", "$\\renewcommand{\\Xb}{\\mathbf{X}}$\n", "$\\renewcommand{\\Lb}{\\mathbf{L}}$\n", "$\\DeclareMathOperator*{\\argmin}{argmin}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Welcome to the 9\n", "Today we will implement our own neural networks with PyTorch, followed by practicing performing convolution on images. \n", "\n", "\n", "## Dependencies\n", "\n", "Before you start, please make sure to install the following packages:\n", "\n", "**torch**: The framework we will use for training deep nets.\n", "\n", "**torchvision**: Helper package consisting of popular datasets, model architectures, and common image transformations for computer vision. We will use it for loading MNIST dataset and simple data transformations.\n", "\n", "**torchsummary**: Helper package for visualizing deep net architectures.\n", "\n", "Please use the following commands to install all the dependencies:\n", "\n", "`pip install torch torchvision torchsummary`\n", "\n", "`conda install nomkl`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 3rd party\n", "import numpy as np\n", "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "from torch.utils.data import DataLoader\n", "from torchvision.datasets import MNIST\n", "from torchvision.transforms import ToTensor, Lambda, Compose\n", "from torchsummary import summary\n", "import matplotlib.pyplot as plt\n", "from scipy.signal import convolve2d\n", "\n", "# Project files.\n", "from helpers import accuracy, visualize_convolution, load_blackwhite_image, DrawingPad" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib notebook\n", "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "device = torch.device(('cpu', 'cuda')[torch.cuda.is_available()])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 1: PyTorch\n", "\n", "### 1.1: Motivation\n", "\n", "In the first part of the exercise we will revisit the MNIST dataset of hand-written digits and we will train deep net models to classify the digits. Instead of doing all the hard coding work manually, we will simplify our life by using a deep learning framework PyTorch.\n", "\n", "Last week we have implemented our own Multi-Layer Perceptron (MLP) where we defined both the feed-forward pass and back-propagation together with a simple optimizer (SGD update rule) and successfully trained it to perform the classification task. Given the amount of code written, one can imagine that prototyping with various NN architectures and training strategies might get tedious. That is where PyTorch (and other deep learning frameworks) come into play.\n", "\n", "### 1.2: About PyTorch\n", "\n", "[PyTorch](https://pytorch.org/) is an optimized tensor library for deep learning using GPUs and CPUs. It allows\n", "for fast prototyping by providing high level access to all necessary building blocks including NN layers, activation functions, loss functions or optimizers to name a few. Most importantly, however, PyTorch implements the [autograd](https://pytorch.org/docs/stable/autograd.html) package which allows for automatic differentiation of the operations we use to define NN architectures. Put in other words, one only has to implement the forward pass, namely to combine desired layers, while the **backpropagation is computed automatically**.\n", "\n", "### 1.3: Basic pipeline\n", "\n", "In order to define and train deep net models, one would usually implement the following steps:\n", "\n", " 1. Load the dataset.\n", " 2. Define and instantiate a deep net architecture.\n", " 3. Choose or implement a loss function (such as the mean squared error).\n", " 4. Choose and instantiate an optimizer (such as the SGD).\n", " 5. For each batch in the dataset:\n", " 5.1. Load a batch.\n", " 5.2. Run forward pass through your model.\n", " 5.3. Compute the loss.\n", " 5.4. Run backward pass, i.e. compute gradients of the loss w.r.t. the trainable parameters (weights).\n", " 5.5. Update the weights using the optimizer.\n", " 5.6. Zero-out the accumulated gradients before the next iteration.\n", " \n", "We will see this exact pipeline in our code as well.\n", "\n", "### 1.4: Essential bulding blocks\n", "\n", "This section gives a high-level summary of the most important components representing the bare minimum which you will need to start playing with PyTorch and deep net models. You might want to skim through the official tutorials as well, namely [What is PyTorch](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py) and [Neural Networks](https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py). Here is the list of the components which will be explained in more detail along with the code blocks.\n", "\n", " - **nn.Module**: Base class for NN architectures.\n", " - **criterion**: A loss function.\n", " - **backward-pass**: Derivatives computed by the auto-diff system.\n", " - **optimizer**: Updates the trainable parameters (weights) during training.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.5: Loading the data\n", "\n", "We are at the step (1) of the training pipeline. PyTorch provides us with the `Dataset` and `DataLoader` classes which manage the loading, shuffling and transformations of the data. Within our training loop we will treat our dataset as an *iterator* which returns the batches of data and associated labels.\n", "\n", "As was the case of previous week, we will work with [MNIST](https://en.wikipedia.org/wiki/MNIST_database) dataset where each sample is stored as $28 \\times 28$ pixels grayscale image. The data are loaded as `torch.Tensor` data type.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "batch_size = 128\n", "\n", "batch_size = 128\n", "\n", "# Flattening transformation.\n", "\n", "##############################################\n", "# MAC and LINUX\n", "##############################################\n", "# flatten = Lambda(lambda x: x.flatten())\n", "# nw_tr = 4\n", "# nw_te = 4\n", "##############################################\n", "\n", "##############################################\n", "# WINDOWS\n", "##############################################\n", "class Flatten:\n", " def __call__(self, x):\n", " return x.flatten()\n", "flatten = Flatten()\n", "nw_tr = 0\n", "nw_te = 0\n", "##############################################\n", "\n", "# Dataset and DataLoader for MLP.\n", "ds_fc_tr = MNIST('data', train=True, download=True, transform=Compose([ToTensor(), flatten]))\n", "ds_fc_te = MNIST('data', train=False, download=True, transform=Compose([ToTensor(), flatten]))\n", "dl_fc_tr = DataLoader(ds_fc_tr, batch_size=batch_size, shuffle=False, num_workers=nw_tr)\n", "dl_fc_te = DataLoader(ds_fc_te, batch_size=batch_size, shuffle=False, num_workers=nw_te)\n", "\n", "# Dataset for CNN.\n", "ds_cnn_tr = MNIST('data', train=True, download=True, transform=ToTensor())\n", "ds_cnn_te = MNIST('data', train=False, download=True, transform=ToTensor())\n", "dl_cnn_tr = DataLoader(ds_cnn_tr, batch_size=batch_size, shuffle=False, num_workers=nw_tr)\n", "dl_cnn_te = DataLoader(ds_cnn_te, batch_size=batch_size, shuffle=False, num_workers=nw_te)\n", "\n", "ntr = len(ds_fc_tr)\n", "nva = len(ds_fc_te)\n", "print('Loaded {} tr and {} va samples.'.format(ntr, nva))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.6: Multi-Layer Perceptron (MLP)\n", "\n", "#### Architecture\n", "\n", "We are at step (2) of the training pipeline. We will start with implementing a MLP consisting of a 1D input layer (we flatten the input image) of shape ($784$, ), $3$ hidden fully connected layers and an output layer of shape ($10$, ), as we have $10$ classes. \n", "\n", "#### Optimization criterion\n", "\n", "We would like to interpret the output vector $\\zb \\in \\real^{10}$ as the probabilities of data sample $\\xb \\in \\real^{784}$ belonging to each class $j \\in \\{1, 2, ... 10\\}$. Therefore, we will make use of the activation function **softmax** defined as:\n", "\n", "$$ P(\\text{class}=j|\\zb) = \\frac{\\exp{\\zb^{\\top}\\wb_{j}}}{\\sum_{k=1}^{10}{\\exp{\\zb^{\\top}\\wb_{k}}}}. $$\n", "\n", "Let $\\zb'$ be the predicted probability distribution with $\\zb'_{j} = \\text{softmax}_{j}(\\zb)$. Softmax guarantees that $\\sum_{k=1}^{10}{\\zb'_{k}} = 1$, meaning that out predicted vector $\\zb'$ is indeed a valid probability distribution over classes. \n", "\n", "Finally, we would like to match the predicted distribution $\\zb'$ to the ground truth (GT) one $\\yb$, where $\\yb$ is given by one-hot encoding ($\\yb$ is all zeros except $1$ at the index $j$, if $j$ is correct class to be predicted). The optimization criterion of choice is then to minimize the [**cross-entropy**](https://en.wikipedia.org/wiki/Cross_entropy) (CE) of $\\zb'$ and $\\yb$, therefore our final loss function $L$ is defined as:\n", "\n", "$$ L = \\text{CE}(\\yb, \\zb').$$\n", "\n", "Thankfully, PyTorch has got us covered by providing the implementation of $L$, so you will only really need to provide the output $\\zb$ (i.e. the 10-dimensional output of your last layer). We will get back to $L$ later.\n", "\n", "---\n", "\n", "#### nn.Module\n", "Each custom NN architecture you choose to implement has to subclass the [`nn.Module`](https://pytorch.org/docs/stable/nn.html#module) which conveniently keeps track of all the trainable parameters. From the programmer perspective, you have to implement the constructor (`__init__`) and override the `forward()` function:\n", "\n", "- **\\_\\_init__()**\n", "\n", "You will define your layers (e.g. fully connected layer, 2D convolutional layer, etc.) in the constructor and `nn.Module` will automatically keep track of all the weights these layers contain.\n", "\n", "- **forward()**\n", "\n", "This function really defines the architecture, as you will sequentally call your layers in the desired order. Each time you call `forward()` (every training iteration), the so called **computational graph** is built. It is a directed acyclic graph (DAG) of nodes corresponding to the operations you have called. Each node defines the derivative of its outputs w.r.t. its inputs. The computational graph is then traversed in the reversed fashion once you call `backward()` and the derivatives are computed.\n", "\n", "All the trainable parameters which your model consists of can be accessed via call to `model.parameters()` implemented in `nn.Module`. This comes in handy once instantiating your optimizer as you have to pass all the parameters you want it to manage.\n", "\n", "---\n", "\n", "Your task is to define the MLP as depicted on the figure below. Please refer to the documentation and focus on\n", "the classes `nn.Linear` to define the layers and `F.relu` to call the activation funtion.\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class FC(nn.Module):\n", " \"\"\" Standard Multi layer perceptron for classification into 10 \n", " classes. Consists of 4 FC layers, ReLU activations are used \n", " for the first 3.\n", " \"\"\"\n", " def __init__(self):\n", " \"\"\" Constructor, layers definitions go here. Only specify\n", " those layers which have any trainable parameters (but for\n", " instance not the activation functions as the ones we use \n", " do not have any trainable parameters). \"\"\"\n", " super(FC, self).__init__()\n", "\n", " self._fc1 = nn.Linear(784, 512)\n", " self._fc2 = nn.Linear(512, 256)\n", " self._fc3 = nn.Linear(256, 128)\n", " self._fc4 = nn.Linear(128, 10)\n", "\n", " def forward(self, x):\n", " \"\"\" Feed-forward pass, this is where the actual computation happens\n", " and the computational graph is built (from scratch each time this \n", " function is called). \"\"\"\n", " \n", " x = F.relu(self._fc1(x))\n", " x = F.relu(self._fc2(x))\n", " x = F.relu(self._fc3(x))\n", " return self._fc4(x)\n", " \n", "# Instantiate the model.\n", "model_fc = FC().to(device)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** How many learnable parameters (weights) does this model have?\n", "\n", "**A:** (784 + 1) * 512 + (512 + 1) * 256 + (256 + 1) * 128 + (128 + 1) * 10 = 567434" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.7: Inspecting the model architecture\n", "\n", "Let us check the model architecture and see how many trainable parameters we really use. For this purpose we will use `torchsummary` package.\n", "\n", "Notice the number of trainable parameters." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "summary(model_fc, input_size=(28 * 28, ))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.8: Loss function\n", "\n", "We are at step (3) of our pipeline. As explained above, our loss function $L$ will be $\\text{CE}(\\yb, \\zb')$, which is provided for us by PyTorch, please refer to the documentation of [`nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/nn.html?highlight=cross_entropy#torch.nn.CrossEntropyLoss).\n", "\n", "There are [many commonly used loss functions](https://pytorch.org/docs/stable/nn.html#loss-functions) defined in the `torch.nn` module and you can implement your own using PyTorch operations as well. \n", "\n", "Your task is to instantiate the CE loss function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Define the loss fuction.\n", "criterion = nn.CrossEntropyLoss()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.9: Optimizer\n", "We are at step (4) of the pipeline. [Optimizer](https://pytorch.org/docs/stable/optim.html) updates the weights given the currently computed gradients. It can be a simple state-less function (such as SGD) or one of more advanced ones which keep track of additional information about the weights and the gradients (such as a running mean) which can be used for smarter update rules.\n", "\n", "We will opt for the simplest case, the state-less SGD. Your task is to instantiate this optimizer, please refer to [`optim.SGD`](https://pytorch.org/docs/stable/optim.html#torch.optim.SGD)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learning_rate = 1e-1\n", "opt = torch.optim.SGD(model_fc.parameters(), lr=learning_rate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.10: Training loop\n", "\n", "We are at step (5) of our pipeline. We would like to define a training loop where we iterate over training samples and train our model. Let us define a function `train_model()` which will be used for training any architecture we come up with.\n", "\n", "Fill in the code which follows the steps 5.2 - 5.6 of our training pipeline. For running the backward pass, use the function [`backward()`](https://pytorch.org/docs/stable/autograd.html?highlight=backward#torch.autograd.backward). For zeroing out the accumulated gradients, use the function [`zero_grad()`](https://pytorch.org/docs/stable/nn.html?highlight=zero_grad#torch.nn.Module.zero_grad)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def train_model(model, crit, opt, dl_tr, dl_te, epochs):\n", " for ep in range(epochs):\n", " # Training.\n", " model.train()\n", " for it, batch in enumerate(dl_tr):\n", " # 5.1 Load a batch.\n", " x, y = [d.to(device) for d in batch]\n", "\n", " # 5.2 Run forward pass.\n", " logits = model(x)\n", " \n", " # 5.3 Compute loss (using 'criterion').\n", " loss = crit(logits, y)\n", " \n", " # 5.4 Run backward pass.\n", " loss.backward()\n", " \n", " # 5.5 Update the weights using optimizer.\n", " opt.step()\n", " \n", " # 5.6 Zero-out the accumualted gradients.\n", " model.zero_grad()\n", "\n", " print('\\rEp {}/{}, it {}/{}: loss train: {:.2f}, accuracy train: {:.2f}'.\n", " format(ep + 1, epochs, it + 1, len(dl_tr), loss,\n", " accuracy(logits, y)), end='')\n", "\n", " # Validation.\n", " model.eval()\n", " with torch.no_grad():\n", " acc_run = 0\n", " for it, batch in enumerate(dl_te):\n", " # Get batch of data.\n", " x, y = [d.to(device) for d in batch]\n", " curr_bs = x.shape[0]\n", " acc_run += accuracy(model(x), y) * curr_bs\n", " acc = acc_run / nva\n", "\n", " print(', accuracy test: {:.2f}'.format(acc))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Training the model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "epochs = 5\n", "train_model(model_fc, criterion, opt, dl_fc_tr, dl_fc_te, epochs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 2: Convolutional Neural Networks (CNNs)\n", "\n", "Our 4 layered MLP network works well reaching test accuracy of ~0.96. However, this network uses ~0.5M weights. We can use even deeper architectures with less weights and take the advantage of the 2D structure of the input data (images) using CNNs.\n", "\n", "### 2.1: LeNet-5\n", "\n", "Let us define a simple CNN network of 2 convolutional layers with max-pooling and 3 FC layers. In particular, we will implement a variant of the architecture called [LeNet-5 introduced by Yann LeCun in 1999](http://yann.lecun.com/exdb/publis/pdf/lecun-99.pdf). \n", "\n", "\n", "Your task is to define a simple LeNet-5 architecture depicted in the figure below. Check the architecture using `torchsummary` and comment on the number of parameters. Finally train the model. To specify the layers, please refer to the functions [`Conv2d`](https://pytorch.org/docs/stable/nn.html#conv2d) and [`max_pool2d`](https://pytorch.org/docs/stable/nn.html?highlight=max_pool2d#torch.nn.functional.max_pool2d).\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class CNN_LeNet(nn.Module):\n", " \"\"\" CNN, expects input shape (28, 28).\n", " \"\"\"\n", " def __init__(self):\n", " super(CNN_LeNet, self).__init__()\n", "\n", " self._conv2d1 = nn.Conv2d(1, 6, 3, 1, padding=1)\n", " self._conv2d2 = nn.Conv2d(6, 16, 3, 1, padding=1)\n", " self._fc1 = nn.Linear(7 * 7 * 16, 120)\n", " self._fc2 = nn.Linear(120, 84)\n", " self._fc3 = nn.Linear(84, 10)\n", "\n", " def forward(self, x):\n", " x = F.max_pool2d(F.relu(self._conv2d1(x)), 2)\n", " x = F.max_pool2d(F.relu(self._conv2d2(x)), 2)\n", " x = x.reshape((x.shape[0], -1))\n", " x = F.relu(self._fc1(x))\n", " x = F.relu(self._fc2(x))\n", " return self._fc3(x)\n", " \n", "# Instantiate the model.\n", "model_lenet = CNN_LeNet().to(device)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** What is the number of trainable parameters in our LeNet model?\n", "\n", "**A:** (6 * 3 * 3 * 1 + 6) + (16 * 3 * 3 * 6 + 16) + (784 + 1) * 120 + (120 + 1) * 84 + (84 + 1) * 10 = 106154" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us check the architecture again and the number of trainable parameters. We can directly see that this architecture needs just about 20% of the parameters the MLP used." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Print out the architecture and check the number of parameters.\n", "summary(model_lenet, input_size=(1, 28, 28))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Train the model\n", "epochs = 5\n", "opt_lenet = torch.optim.SGD(model_lenet.parameters(), lr=learning_rate)\n", "train_model(model_lenet, F.cross_entropy, opt_lenet, dl_cnn_tr, dl_cnn_te, epochs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2: 3-layered CNN\n", "\n", "Let us now define even deeper CNN with 3 convolutional layers and only 2 FC layers. This network should reach higher accuracy (or converge faster) and still use less parameters than the previous architectures.\n", "\n", "Your task is to implement a 3 layered CNN as depicted in the figure below. Check the number of parameters using `torchsummary`. Train the model and play around with the number of filters (kernels) which are used by every layer. Comment on your findings.\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class CNN(nn.Module):\n", " \"\"\" CNN, expects input shape (28, 28).\n", " \"\"\"\n", " def __init__(self, filters=(16, 32, 64)):\n", " super(CNN, self).__init__()\n", "\n", " self._conv2d1 = nn.Conv2d(1, filters[0], 3, 1, padding=1)\n", " self._conv2d2 = nn.Conv2d(filters[0], filters[1], 3, 1, padding=1)\n", " self._conv2d3 = nn.Conv2d(filters[1], filters[2], 3, 1, padding=1)\n", " self._fc1 = nn.Linear(3 * 3 * filters[2], 128)\n", " self._fc2 = nn.Linear(128, 10)\n", "\n", " def forward(self, x):\n", " x = F.max_pool2d(F.relu(self._conv2d1(x)), 2)\n", " x = F.max_pool2d(F.relu(self._conv2d2(x)), 2)\n", " x = F.max_pool2d(F.relu(self._conv2d3(x)), 2)\n", " x = x.reshape((x.shape[0], -1))\n", " x = F.relu(self._fc1(x))\n", " return self._fc2(x)\n", "\n", "# Instantiate the model.\n", "filters = [16, 32, 64]\n", "model_cnn = CNN(filters).to(device)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Print out the architecture and number of parameters.\n", "summary(model_cnn, input_size=(1, 28, 28))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Train the model.\n", "opt_cnn = torch.optim.SGD(model_cnn.parameters(), lr=learning_rate)\n", "train_model(model_cnn, F.cross_entropy, opt_cnn, dl_cnn_tr, dl_cnn_te, epochs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3: Trying out your own input\n", "\n", "We have provided a tool for you to draw your own digits and test your network. Play around with the inputs to get a sense of how accurate your model is. Us the button `reset` to reset the canvas and `predict` to run the prediction on current canvas image. You can use the button `blur` to blur your drawn image so that it looks closer to the samples from the training set." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dp = DrawingPad((28, 28), model_lenet, device)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pred = model_lenet(torch.from_numpy(dp.grid).to(device)[None, None])\n", "clp = torch.argmax(pred)\n", "print(\"Your prediction:\", clp.item())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 3: Convolution\n", "\n", "In this part, we will go into more detail about the convolution operation, which is used extensively in CNNs. \n", "\n", "### 3.1: Introduction: 1-D Convolution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's recall the definition of the convolution operation.\n", "\n", "$$ y[i] = \\sum_{m=- \\infty}^{\\infty} x[m] \\cdot h[i-m] $$\n", "\n", "We notice that the steps for convolution in this case are the following:\n", "* We flip our filter $h[m]$ so that it becomes $h[-m]$.\n", "* We shift $h[-m]$ to index $i$ so that it becomes $h[i-m]$.\n", "* Multiply the two arrays $x[m]$ and $h[i-m]$ and sum over all values of $m$.\n", "* As we calculate this for each index $i$, we are sliding filter $h[m]$ over $x[m]$ and repeating the steps above.\n", "\n", "Consider the input array $x$ and filter $h$ given below:\n", "$$x = [2,2,2,2,2,2,2,10,10,10,10,10,1,1,1,1,1,1,1,1,5,5,5,5,5] \\\\\n", "h = [1, 0, -1]$$\n", "\n", "Note that the filter is centered at index 0 ($h[-1]=1, h[0]=0, h[1]=1$)\n", "\n", "Let's plot them to see what they look like:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.array([2,2,2,2,2,2,2,10,10,10,10,10,1,1,1,1,1,1,1,1,5,5,5,5,5])\n", "h = np.array([-1,0,1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(6, 3))\n", "plt.subplot(1,2,1)\n", "plt.stem(x, use_line_collection=True)\n", "plt.title(\"input array\")\n", "\n", "plt.subplot(1,2,2)\n", "plt.stem(range(-1,2), h, use_line_collection=True)\n", "plt.title(\"filter\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below we have a visual explanation of convolution. \n", "Typing `q` into the input bar will quit the visualization." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "visualize_convolution()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Calculate the output of the convolution operation by hand (you can do so for the first few elements of the result until you get the hang of it).\n", "\n", "For $y[0]$, we shift $h[-m]$ to index 0. Then we multiply with $x[m]$. Since h[-m] is 0 everywhere except for indices -1, 0 and 1, our multiplication results in:\n", "$$y[0] = x[-1]h[1] + x[0]h[0] + x[1]h[-1] = -2 \n", "$$\n", "\n", "For $y[1]$ we have $h[1-m]$ which is nonzero only at indices 0, 1, 2. The result is:\n", "$$\n", "y[1] = x[0]h[1] + x[1]h[0] + x[2]h[-1] = 0 \n", "$$\n", "\n", "For $y[2]$:\n", "$$\n", "y[2] = x[1]h[1] + x[2]h[0] + x[3]h[-1] = 0 \n", "$$\n", "etc." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Notice that to calculate $y[0]$ you need the element $x[-1]$. A way to overcome this problem is to pad the input array with 0's. We pad both the beginning and the end of the array by `M//2` where $M$ denotes the filter size. The function to do padding for 1D arrays is given below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#function to add zero padding to input array\n", "def add_padding_1d(x, filter_size):\n", " \"\"\"\n", " Adds zero padding to a 1-d array.\n", " Args:\n", " x (np.array): Input array.\n", " filter_size (int): size of the filter used in convolution\n", " Returns:\n", " np.array: resulting zero padded array.\n", " \"\"\"\n", " \n", " return np.concatenate([np.zeros([filter_size//2,]), x, np.zeros([filter_size//2,])])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fill in the convolution function given below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#function to compute 1-D convolution\n", "def convolution_one_dimensional(x, h):\n", " \"\"\"\n", " 1-d convolution\n", " Args:\n", " x (np.array): Input array.\n", " h (np.array): Filter\n", " Returns:\n", " np.array: convolution result\n", " \"\"\"\n", " \n", " filter_size = h.shape[0]\n", " input_array_size = x.shape[0]\n", " \n", " #adding padding to input array\n", " x = add_padding_1d(x, filter_size)\n", " \n", " #flip kernel\n", " h = h[::-1] #your code here\n", " \n", " #slide kernel over input array \n", " #Your code here:\n", " y = np.zeros(input_array_size)\n", " for i in range(input_array_size):\n", " y[i] = np.dot(x[i:i+filter_size], h)\n", " \n", " return y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the code below to plot your result. We compare with numpy's `np.convolve()` function as a sanity check." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y = convolution_one_dimensional(x, h)\n", "sanity_check = np.convolve(a=x,v=h,mode='same')\n", "print(\"My result\", y)\n", "print(\"Numpy's result\", sanity_check)\n", "\n", "plt.figure()\n", "plt.subplot(1,1,1)\n", "markerline, stemlines, baseline = plt.stem(x, linefmt=':', markerfmt=\"*\", label=\"input\", use_line_collection=True)\n", "plt.setp(stemlines, color='r', linewidth=2)\n", "plt.stem(y, label=\"result\", use_line_collection=True)\n", "plt.title(\"result\")\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Ignore the edge effects in the result above (these are caused by the padding). What is this filter doing? \n", "\n", "This filter is doing edge detection. It gives a response whenever the values in the array change. The more sudden the change, the greater magnitude of the response. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2: 2-D Convolution and Trying Different Image Filters\n", "\n", "In a similar way, 2D convolution is defined as the following:\n", "\n", "$$ y[i, j] = \\sum_{m} \\sum_{n} x[m, n] \\cdot h[i-m, j-n] $$\n", "\n", "In this case, we have 2D filters that we slide over our image. \n", "\n", "It's important to observe the effects of convolving images with different filters to get an idea of what our Convolutional Neural Networks learn to \"see\" in the images.\n", "\n", "Let's load an image to test some filters on." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#true_logo = load_blackwhite_image(image_name=\"img/old_logo.png\")\n", "true_logo = load_blackwhite_image(image_name=\"img/new_logo.png\")\n", "\n", "plt.figure()\n", "plt.imshow(true_logo, cmap='gray')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will create some Gaussian filters and observe their effect on our images. \n", "\n", "**Q**: Run the code below and comment on what you see. What happens as we increase the $\\sigma$ value of the Gaussian?\n", "\n", "Gaussian filters add a blurring effect in the image. We are multiplying patches in our image by 2D Gaussians, then summing to find the value of the center pixel. For the result, the original value of the center pixel has the highest weight, and the weights decrease for the neighboring pixels. This creates a blurring effect. \n", "\n", "As we increase the sigma value, the spread of the Gaussian increases. We give more weight to the neighboring pixels in the summation. Therefore, the blurring effect increases." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def gaussian_filter(sigma, filter_size):\n", " x = np.linspace(-1,1,filter_size)\n", " filter_one_d = np.exp(-0.5*(x/sigma)**2)\n", " filter_res = np.dot(filter_one_d[:,None], filter_one_d[:,None].T)\n", " filter_res = filter_res/np.sum(filter_res)\n", " return filter_res\n", "\n", "#try different filters given below:\n", "filter_1 = gaussian_filter(sigma=0.2, filter_size=8)\n", "filter_2 = gaussian_filter(sigma=0.5, filter_size=8)\n", "filter_3 = gaussian_filter(sigma=5, filter_size=8)\n", "\n", "gaussian_filters = [filter_1, filter_2, filter_3]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for ind, a_filter in enumerate(gaussian_filters):\n", " result = convolve2d(true_logo, a_filter, mode=\"same\")\n", "\n", " #visualize filter and resulting image\n", " plt.figure(figsize=(10, 5))\n", " plt.subplot(1,2,1)\n", " plt.imshow(a_filter, cmap='gray')\n", " plt.title(\"filter \"+ str(ind))\n", " plt.colorbar()\n", " \n", " plt.subplot(1,2,2)\n", " plt.imshow(result, cmap='gray')\n", " plt.colorbar()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let us create a horizontal and a vertical Sobel operator and convolve it with the image. Run the code below.\n", "\n", "**Q**: What effects do you observe now? \n", " \n", "The Sobel filter does edge detection." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filter_4 = np.array([[-1,-2,-1],[0,0,0],[1,2,1]])\n", "filter_5 = filter_4.T\n", "\n", "sobel_filters = [filter_4, filter_5]\n", "\n", "for ind, a_filter in enumerate(sobel_filters):\n", " result = convolve2d(true_logo, a_filter, mode=\"same\")\n", " \n", " plt.figure(figsize=(9, 3))\n", " \n", " #your code here\n", " thresholded_result = np.zeros(result.shape)\n", " thresholded_result[result > np.mean(result)] = 1\n", " thresholded_result[result <= np.mean(result)] = 0\n", "\n", " #visualize filter and resulting image\n", " plt.subplot(1,3,1)\n", " plt.imshow(a_filter, cmap='gray')\n", " plt.title(\"filter \"+ str(ind))\n", " plt.colorbar(fraction=0.046)\n", " \n", " plt.subplot(1,3,2)\n", " plt.imshow(result, cmap='gray')\n", " plt.title('result')\n", " plt.colorbar(fraction=0.04)\n", " \n", " plt.subplot(1,3,3)\n", " plt.imshow(thresholded_result, cmap='gray')\n", " plt.colorbar(fraction=0.046)\n", " plt.title('thresholded result')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We hand designed the filters ourselves, but in the case of CNNs, they learn what the filters look like. Oftentimes, the filters learned by CNNs in the early layers look like the filters we have just designed.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 4 }