{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Miniproject 1: Image Classification\n", "\n", "## Introduction\n", "\n", "### Important dates:\n", "\n", "- Project release: Friday, 15th March 2019\n", "- **Submission deadline**: Monday, 29th April 2019, 11:59 pm\n", "\n", "### Description\n", "\n", "One of the deepest traditions in learning about deep learning is to first [tackle the exciting problem of MNIST classification](http://yann.lecun.com/exdb/mnist/). [The MNIST database](https://en.wikipedia.org/wiki/MNIST_database) (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used as a first test for new classification algorithms. \n", "We follow this tradition to investigate the performance of artificial neural networks of different complexity on MNIST. However, since MNIST is too easy for accessing the full power of modern machine learning algorithms (see e.g. [this post](https://twitter.com/goodfellow_ian/status/852591106655043584)) we will extend our analysis to the recently introduced, harder [Fashion-MNIST dataset](https://github.com/zalandoresearch/fashion-mnist).\n", "\n", "\n", "### Prerequisites\n", "\n", "- You should have a running installation of [tensorflow](https://www.tensorflow.org/install/) and [keras](https://keras.io/). Feel free to gain inspiration from the [Keras example directory](https://github.com/keras-team/keras/tree/master/examples) for your implementations.\n", "- You should know the concepts \"multilayer perceptron\", \"stochastic gradient descent with minibatches\", \"convolutional neural network\", \"training and validation data\", \"overfitting\" and \"early stopping\".\n", "\n", "### What you will learn\n", "\n", "- You will learn how to define feedforward neural networks in keras and fit them to data.\n", "- You will be guided through a prototyping procedure for the application of deep learning to a specific domain.\n", "- You will get in contact with concepts discussed later in the lecture, like \"regularization\", \"batch normalization\" and \"convolutional networks\".\n", "- You will gain some experience on the influence of network architecture, optimizer and regularization choices on the goodness of fit.\n", "- You will learn to be more patient :) Some fits may take your computer quite a bit of time; run them over night (or on an external server).\n", "\n", "### Evaluation criteria\n", "\n", "The evaluation is (mostly) based on the figures you submit and your answer sentences. Provide clear and concise answers respecting the indicated maximum length (answers to the questions should be below the line that says \"Answer to question ...\").\n", "\n", "**The submitted notebook must be run by you!** We will only do random tests of your code and not re-run the full notebook. There will be fraud detection sessions at the end of the semester.\n", "\n", "### Your names\n", "\n", "**Before you start**: please enter your full name(s) in the field below." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2018-03-09T09:08:24.514461Z", "start_time": "2018-03-09T09:08:24.506410Z" } }, "outputs": [], "source": [ "student1 = \"Firstname Lastname\"\n", "student2 = \"\"" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2018-02-22T21:52:59.697375Z", "start_time": "2018-02-22T21:52:59.689443Z" } }, "source": [ "## Some helper functions\n", "\n", "For your convenience we provide here some functions to preprocess the data and plot the results later. Simply run the following cells with `Shift-Enter`.\n", "\n", "### Dependencies and constants" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2018-02-23T14:27:09.352019Z", "start_time": "2018-02-23T14:27:08.476310Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using TensorFlow backend.\n" ] } ], "source": [ "%matplotlib inline\n", "\n", "import numpy as np\n", "import time\n", "import matplotlib.pyplot as plt\n", "import scipy.io\n", "\n", "import keras\n", "from keras.models import Sequential\n", "from keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten\n", "from keras.optimizers import SGD, Adam" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2018-02-23T15:11:52.252208Z", "start_time": "2018-02-23T15:11:52.121360Z" } }, "outputs": [], "source": [ "def plot_some_samples(x, y = [], yhat = [], select_from = [], \n", " ncols = 6, nrows = 4, xdim = 28, ydim = 28,\n", " label_mapping = range(10)):\n", " \"\"\"plot some input vectors as grayscale images (optionally together with their assigned or predicted labels).\n", " \n", " x is an NxD - dimensional array, where D is the length of an input vector and N is the number of samples.\n", " Out of the N samples, ncols x nrows indices are randomly selected from the list select_from (if it is empty, select_from becomes range(N)).\n", " \n", " Keyword arguments:\n", " y -- corresponding labels to plot in green below each image.\n", " yhat -- corresponding predicted labels to plot in red below each image.\n", " select_from -- list of indices from which to select the images.\n", " ncols, nrows -- number of columns and rows to plot.\n", " xdim, ydim -- number of pixels of the images in x- and y-direction.\n", " label_mapping -- map labels to digits.\n", " \n", " \"\"\"\n", " fig, ax = plt.subplots(nrows, ncols)\n", " if len(select_from) == 0:\n", " select_from = range(x.shape[0])\n", " indices = np.random.choice(select_from, size = min(ncols * nrows, len(select_from)), replace = False)\n", " for i, ind in enumerate(indices):\n", " thisax = ax[i//ncols,i%ncols]\n", " thisax.matshow(x[ind].reshape(xdim, ydim), cmap='gray')\n", " thisax.set_axis_off()\n", " if len(y) != 0:\n", " j = y[ind] if type(y[ind]) != np.ndarray else y[ind].argmax()\n", " thisax.text(0, 0, (label_mapping[j]+1)%10, color='green', \n", " verticalalignment='top',\n", " transform=thisax.transAxes)\n", " if len(yhat) != 0:\n", " k = yhat[ind] if type(yhat[ind]) != np.ndarray else yhat[ind].argmax()\n", " thisax.text(1, 0, (label_mapping[k]+1)%10, color='red',\n", " verticalalignment='top',\n", " horizontalalignment='right',\n", " transform=thisax.transAxes)\n", " return fig\n", "\n", "def prepare_standardplot(title, xlabel):\n", " fig, (ax1, ax2) = plt.subplots(1, 2)\n", " fig.suptitle(title)\n", " ax1.set_ylabel('categorical cross entropy')\n", " ax1.set_xlabel(xlabel)\n", " ax1.set_yscale('log')\n", " ax2.set_ylabel('accuracy [% correct]')\n", " ax2.set_xlabel(xlabel)\n", " return fig, ax1, ax2\n", "\n", "def finalize_standardplot(fig, ax1, ax2):\n", " ax1handles, ax1labels = ax1.get_legend_handles_labels()\n", " if len(ax1labels) > 0:\n", " ax1.legend(ax1handles, ax1labels)\n", " ax2handles, ax2labels = ax2.get_legend_handles_labels()\n", " if len(ax2labels) > 0:\n", " ax2.legend(ax2handles, ax2labels)\n", " fig.tight_layout()\n", " plt.subplots_adjust(top=0.9)\n", "\n", "def plot_history(history, title):\n", " fig, ax1, ax2 = prepare_standardplot(title, 'epoch')\n", " ax1.plot(history.history['loss'], label = \"training\")\n", " ax1.plot(history.history['val_loss'], label = \"validation\")\n", " ax2.plot(history.history['acc'], label = \"training\")\n", " ax2.plot(history.history['val_acc'], label = \"validation\")\n", " finalize_standardplot(fig, ax1, ax2)\n", " return fig\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 1: Data import and visualization (4 points)\n", "\n", "### Description\n", "\n", "### Loading the data\n", "\n", "The datasets we use in this project (MNIST, Fashion-MNIST) consists of grayscale images with 28x28 pixels. Keras comes with a convenient in-built [data importer](https://keras.io/datasets/) for common datasets.\n", "\n", "1. As a warm-up exercise, use this importer to (down-)load the MNIST and Fashion-MNIST dataset. Assign useful variables to test & train images and labels for both datasets respectively. (2 pts)\n", "2. Use the corresponding plotting function defined above to plot some samples of the two datasets. What do the green digits at the bottom left of each image indicate? (1 sentence max.) (2 pts)\n", "\n", "The low resolution (and grayscale) of the images certainly misses some information that could be helpful for classifying the images. However, since the data has lower dimensionality due to the low resolution, the fitting procedures converge faster. This is an advantage in situations like here (or generally when prototyping), were we want to try many different things without having to wait too long for computations to finish.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Solution" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2018-02-23T14:27:44.442862Z", "start_time": "2018-02-23T14:27:09.505547Z" } }, "outputs": [], "source": [ "...\n", "\n", "(x_train, y_train), (x_test, y_test) = ...\n", "(x_fashion_train, y_fashion_train), (x_fashion_test, y_fashion_test) = ..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Answer to question 2:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Data pre-processing**: To prepare for fitting we transform the labels to one hot coding, i.e. for 5 classes, label 2 becomes the vector [0, 0, 1, 0, 0] (python uses 0-indexing). Furthermore we reshape (flatten) the input images to input vectors and rescale the data into the range [0,1]." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "y_train = keras.utils.to_categorical(y_train)\n", "y_test = keras.utils.to_categorical(y_test)\n", "\n", "y_fashion_train = keras.utils.to_categorical(y_fashion_train)\n", "y_fashion_test = keras.utils.to_categorical(y_fashion_test)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "x_train = x_train.reshape(x_train.shape[0], x_train.shape[1]*x_train.shape[2])/np.max(x_train)\n", "x_test = x_test.reshape(x_test.shape[0], x_test.shape[1]*x_test.shape[2])/np.max(x_test)\n", "\n", "x_fashion_train = x_fashion_train.reshape(x_fashion_train.shape[0], x_fashion_train.shape[1]*x_fashion_train.shape[2])/np.max(x_fashion_train)\n", "x_fashion_test = x_fashion_test.reshape(x_fashion_test.shape[0], x_fashion_test.shape[1]*x_fashion_test.shape[2])/np.max(x_fashion_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 2: No hidden layer (10 points)\n", "\n", "### Description\n", "\n", "Define and fit a model without a hidden layer (since we will use multi-layer models later in this project, you can define a general constructor function for models with an arbitrary number of hidden layers already at this point). (1 pt for each step)\n", "\n", "1. Use the softmax activation for the output layer.\n", "2. Use the categorical_crossentropy loss.\n", "3. Add the accuracy metric to the metrics.\n", "4. Choose stochastic gradient descent for the optimizer.\n", "5. Choose a minibatch size of 128.\n", "6. Fit for as many epochs as needed to see no further decrease in the validation loss.\n", "7. Plot the output of the fitting procedure (a history object) using the function plot_history defined above.\n", "8. Determine the indices of all test images that are misclassified by the fitted model and plot some of them using the function \n", " `plot_some_samples(x_test, y_test, yhat_test, error_indices)`. Explain the green and red digits at the bottom of each image.\n", "9. Repeat the above steps for fitting the network to the Fashion-MNIST dataset.\n", "\n", "\n", "Hints:\n", "* Read the keras docs, in particular [Getting started with the Keras Sequential model](https://keras.io/getting-started/sequential-model-guide/).\n", "* Have a look at the keras [examples](https://github.com/keras-team/keras/tree/master/examples), e.g. [mnist_mlp](https://github.com/keras-team/keras/blob/master/examples/mnist_mlp.py)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Solution" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Answer to question 10:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 3: One hidden layer, different optizimizers & overfitting (10 points)\n", "\n", "### Description\n", "\n", "Train a network with one hidden layer and compare different optimizers.\n", "\n", "1. Use one hidden layer with 128 units and the 'relu' activation. Use the [summary method](https://keras.io/models/about-keras-models/) to display your model in a compact way. (1 pt)\n", "2. Fit the model for 50 epochs with different learning rates of stochastic gradient descent (SGD). (1pt)\n", "3. Replace the stochastic gradient descent optimizer with the [Adam optimizer](https://keras.io/optimizers/#adam). (1pt)\n", "4. Plot the learning curves of SGD with a reasonable learning rate (i.e. in the range [0.01,0.1]) together with the learning curves of Adam in the same figure. Take care of a reasonable labeling of the curves in the plot. (2pts)\n", "5. Answer the questions below. (4pts)\n", "6. Run the network (using the Adam optimizer) on the Fashion-MNIST dataset and plot the learning curves using the plot_history function defined above. (1pt)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Solution" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2018-02-23T15:42:45.497806Z", "start_time": "2018-02-23T15:42:44.961166Z" } }, "outputs": [], "source": [ "# This plotting routine might help you ...\n", "def comparison_plot(history_sgd, history_adam, label1, label2, title):\n", " fig, ax1, ax2 = prepare_standardplot(title, \"epochs\")\n", " ax1.plot(history_sgd.history['loss'], label=label1 + ' training')\n", " ax1.plot(history_sgd.history['val_loss'], label=label1 + ' validation')\n", " ax1.plot(history_adam.history['loss'], label=label2 + ' training')\n", " ax1.plot(history_adam.history['val_loss'], label=label2 + ' validation')\n", " ax2.plot(history_sgd.history['acc'], label=label1 + ' training')\n", " ax2.plot(history_sgd.history['val_acc'], label=label1 + ' validation')\n", " ax2.plot(history_adam.history['acc'], label=label2 + ' training')\n", " ax2.plot(history_adam.history['val_acc'], label=label2 + ' validation')\n", " finalize_standardplot(fig, ax1, ax2)\n", " return fig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question**: What happens if the learning rate of SGD is A) very large B) very small? Please answer A) and B) with one full sentence each (double click this markdown cell to edit).\n", "\n", "**Answer**:\n", "\n", "A)\n", "\n", "B)\n", "\n", "**Question**: At which epoch (approximately) does the Adam optimizer start to overfit (on MNIST)? Please answer with one full sentence.\n", "\n", "**Answer**:\n", "\n", "**Question**: Explain the qualitative difference between the loss curves and the accuracy curves with respect to signs of overfitting. Please answer with at most 3 full sentences.\n", "\n", "**Answer**:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 4: Model performance as a function of number of hidden neurons (8 points)\n", "\n", "### Description\n", "\n", "Investigate how the best validation loss and accuracy depends on the number of hidden neurons in a single layer.\n", "\n", "1. Fit a reasonable number of models (e.g. 5) with different hidden layer sizes (between 10 and 1000 hidden neurons) to the MNIST dataset. You may use the Adam optimizer and a meaningful number of epochs (overfitting!). (3 pts)\n", "2. Plot the best validation loss and accuracy versus the number of hidden neurons. Is the observed trend in accordance with the [general approximation theorem](https://en.wikipedia.org/wiki/Universal_approximation_theorem)? If not, what might be practical reasons for the deviation? (2 sentences max.) (3 pts)\n", "3. Repeat steps 1. & 2. for the Fashion-MNIST dataset. (2 pts)\n", "\n", "In this exercise we fit each model only for one initialization and random seed. In practice one would collect some statistics (e.g. 25-, 50-, 75-percentiles) for each layer size by fitting each model several times with different initializations and the random seeds. You may also want to do this here. It is a good exercise, but not mandatory as it takes quite a bit of computation time.\n", "\n", "### Solution" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "ExecuteTime": { "end_time": "2018-02-23T14:58:15.181352Z", "start_time": "2018-02-23T14:31:52.623267Z" } }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Answer to question 2:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 5: Going deeper: tricks and regularization (8 points)\n", "\n", "### Description\n", "\n", "Adding hidden layers to a deep network does not necessarily lead to a straight-forward improvement of performance. Overfitting can be counteracted with regularization and dropout. Batch normalization is supposed to mainly speed up convergence. Since the MNIST dataset is almost perfectly solved already by a one-hidden-layer network we use the Fashion-MNIST dataset in this exercise.\n", "\n", "1. Add one or two hidden layers with 50 hidden neurons (each) and train the network for a sufficiently long time (at least 100 epochs). Since deep models are very expressive you will most probably encounter overfitting. Try to improve the best validation scores of the model (even if it is only a minor improvement) by experimenting with batch_normalization layers, dropout layers and l1- and l2-regularization on weights (kernels) and biases. (4 pts)\n", "2. After you have found good settings, plot the learning curves for both models, naive (=no tricks/regularization) and tuned (=tricks + regularized), preferably together in a comparison plot. Discuss your results; refer to the model performance with only 1 hidden layer. (2 sentences max.) (2pts)\n", "3. Fit your best performing (probably regularized deep) model also to MNIST for having a reference for the next exercise. Plot the resulting learning curves. (2 pts)\n", "\n", "### Solution" ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "ExecuteTime": { "end_time": "2018-02-23T16:26:50.480763Z", "start_time": "2018-02-23T16:06:32.938435Z" } }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Answer to question 2 (comments):" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 6: Convolutional neural networks (CNNs) (10 points)\n", "\n", "### Description\n", "\n", "Convolutional neural networks have an inductive bias that is well adapted to image classification.\n", "\n", "1. Design a convolutional neural network, play with different architectures and parameters. Hint: You may get valuable inspiration from the keras [examples](https://github.com/keras-team/keras/tree/master/examples). (4 pts)\n", "2. Plot the learning curves of the convolutional neural network for MNIST and Fashion-MNIST. (4 pts)\n", "3. How does the CNN performance compare to the so far best performing (deep) neural network model for the two data sets? (2 sentences max.) (2 pts)\n", "\n", "### Solution" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "ExecuteTime": { "end_time": "2018-02-23T16:05:21.840299Z", "start_time": "2018-02-23T15:51:11.993053Z" } }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Answer to question 3:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 7: Sigmoidal activation function and batch-normalization (6 points)\n", "\n", "### Description:\n", "\n", "In the original publication of batch normalization [Ioffe and Szegedy, 2014](https://arxiv.org/pdf/1502.03167.pdf), the authors mention a particularly beneficial effect of their method on networks with sigmoidal activation functions. This is because such networks usually suffer from saturating activations/vanishing gradients. Here we want to reproduce this behaviour (Chose either MNIST or Fashion-MNIST for this exercise).\n", "\n", "1. Implement the same convolutional network as in the previous exercise, but using the sigmoid activation function instead of the standard choice ReLU. Train the network for a reasonable amount of time. What do you observe? (1 sentence max.) (3 pts)\n", "2. Add batch-normalization layers to all convolutional and fully-connected layers (i.e. before each layer with learnable parameters). How does the performance change? Can the network reach the ReLU-CNN performance of the previous exercise? (1 sentence max.) (3 pts)\n", "3. **BONUS (optional, not graded**): Investigate our initial guess that saturating activity/vanishing gradients might be the cause of this behaviour. For that, create histograms of the hidden activitions for different hidden layers for the sigmoid-CNN and the sigmoid-CNN with batch-normalization (counting over both, samples and neurons per layer). You may only chose layers with learnable parameters. What do you observe?\n", "Hint: You can use the [keract](https://github.com/philipperemy/keract) package to access neural activation values for all layers of your network model.\n", "\n", "\n", "\n", "### Solution:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Answer to question 1:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Answer to question 2:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" } }, "nbformat": 4, "nbformat_minor": 2 }