SVM支持向量機及cs231n作業解讀

語言: CN / TW / HK

本文已參與「新人創作禮」活動,一起開啟掘金創作之路。

SVM支持向量機

為了讓自己的cs231n學習更加高效且容易複習,特此在這裏記錄學習過程,以供參考。

作用機理

SVM損失函數就是説,正確的分類要比其他分類項多出一個delta值(一個超參數),此時損失函數的值才為0,我們的目標就是使權重W符合使正確分類的得分比不正確的要高出這個值。

Assignment 1的SVM作業

cs231n中的SVM作業練習

svm.ipynb(上)

```python

Run some setup code for this notebook.

import random import numpy as np from cs231n.data_utils import load_CIFAR10 import matplotlib.pyplot as plt

This is a bit of magic to make matplotlib figures appear inline in the

notebook rather than in a new window.

%matplotlib inline plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots plt.rcParams['image.interpolation'] = 'nearest' plt.rcParams['image.cmap'] = 'gray'

Some more magic so that the notebook will reload external python modules;

see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython

%load_ext autoreload %autoreload 2 ``` ↑首先第一步,還是導入一些必要的庫,設置一下圖像參數。

CIFAR-10 Data Loading and Preprocessing

```python

Load the raw CIFAR-10 data.

cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'

Cleaning up variables to prevent loading data multiple times (which may cause memory issue)

try: del X_train, y_train del X_test, y_test print('Clear previously loaded data.') except: pass

X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

As a sanity check, we print out the size of the training and test data.

print('Training data shape: ', X_train.shape) print('Training labels shape: ', y_train.shape) print('Test data shape: ', X_test.shape) print('Test labels shape: ', y_test.shape) ``` ↑這裏加載了CIFAR-10的數據,展示了訓練集和測試集的大小:

Training data shape: (50000, 32, 32, 3) \ Training labels shape: (50000,)\ Test data shape: (10000, 32, 32, 3)\ Test labels shape: (10000,)

```python

Visualize some examples from the dataset.

We show a few examples of training images from each class.

classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] num_classes = len(classes) samples_per_class = 7 for y, cls in enumerate(classes): idxs = np.flatnonzero(y_train == y) idxs = np.random.choice(idxs, samples_per_class, replace=False) for i, idx in enumerate(idxs): plt_idx = i * num_classes + y + 1 plt.subplot(samples_per_class, num_classes, plt_idx) plt.imshow(X_train[idx].astype('uint8')) plt.axis('off') if i == 0: plt.title(cls) plt.show() ``` ↑每一類選出7張圖,並進行可視化,操作結果如下:

image.png

```python

Split the data into train, val, and test sets. In addition we will

create a small development set as a subset of the training data;

we can use this for development so our code runs faster.

num_training = 49000 num_validation = 1000 num_test = 1000 num_dev = 500

Our validation set will be num_validation points from the original

training set.

mask = range(num_training, num_training + num_validation) X_val = X_train[mask] y_val = y_train[mask]

Our training set will be the first num_train points from the original

training set.

mask = range(num_training) X_train = X_train[mask] y_train = y_train[mask]

We will also make a development set, which is a small subset of

the training set.

mask = np.random.choice(num_training, num_dev, replace=False) X_dev = X_train[mask] y_dev = y_train[mask]

We use the first num_test points of the original test set as our

test set.

mask = range(num_test) X_test = X_test[mask] y_test = y_test[mask]

print('Train data shape: ', X_train.shape) print('Train labels shape: ', y_train.shape) print('Validation data shape: ', X_val.shape) print('Validation labels shape: ', y_val.shape) print('Test data shape: ', X_test.shape) print('Test labels shape: ', y_test.shape) ``` ↑將數據導入訓練集(49000),驗證集(1000)和測試集(1000),同時從訓練集中抽取了一小部分dev(500),説是為了讓代碼運行的更快,可能是先用這一小部分進行運算?我現在還不太明白這500個數據的意義,我們接着往下看:

Train data shape: (49000, 32, 32, 3) \ Train labels shape: (49000,) \ Validation data shape: (1000, 32, 32, 3) \ Validation labels shape: (1000,) \ Test data shape: (1000, 32, 32, 3) \ Test labels shape: (1000,)

```python

Preprocessing: reshape the image data into rows

X_train = np.reshape(X_train, (X_train.shape[0], -1)) X_val = np.reshape(X_val, (X_val.shape[0], -1)) X_test = np.reshape(X_test, (X_test.shape[0], -1)) X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

As a sanity check, print out the shapes of the data

print('Training data shape: ', X_train.shape) print('Validation data shape: ', X_val.shape) print('Test data shape: ', X_test.shape) print('dev data shape: ', X_dev.shape) ``` ↑將矩陣展開成向量:

Training data shape: (49000, 3072) \ Validation data shape: (1000, 3072) \ Test data shape: (1000, 3072) \ dev data shape: (500, 3072)

```python

Preprocessing: subtract the mean image

first: compute the image mean based on the training data

mean_image = np.mean(X_train, axis=0) print(mean_image[:10]) # print a few of the elements plt.figure(figsize=(4,4)) plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image plt.show()

second: subtract the mean image from train and test data

X_train -= mean_image X_val -= mean_image X_test -= mean_image X_dev -= mean_image

third: append the bias dimension of ones (i.e. bias trick) so that our SVM

only has to worry about optimizing a single weight matrix W.

X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))]) X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))]) X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))]) X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])

print(X_train.shape, X_val.shape, X_test.shape, X_dev.shape) ``` ↑這裏應該是在做均值歸一化操作,等我之後再仔細研究一下哈,有點困了~然後對數據集也加了一列1,結果如下:

[130.64189796 135.98173469 132.47391837 130.05569388 135.34804082 \ 131.75402041 130.96055102 136.14328571 132.47636735 131.48467347] \ 在這裏插入圖片描述 \ (49000, 3073) (1000, 3073) (1000, 3073) (500, 3073)

接下來要進入linear_svm.py編寫SVM分類器了

SVM Classifier Your code for this section will all be written inside cs231n/classifiers/linear_svm.py.

As you can see, we have prefilled the function svm_loss_naive which uses for loops to evaluate the multiclass SVM loss function.

linear_svm.py

```python from builtins import range import numpy as np from random import shuffle from past.builtins import xrange

def svm_loss_naive(W, X, y, reg): """ Structured SVM loss function, naive implementation (with loops).

Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.

Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
  that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength

Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
dW = np.zeros(W.shape) # initialize the gradient as zero

# compute the loss and the gradient
num_classes = W.shape[1] #將權重W的列數賦值給num_classes
num_train = X.shape[0] #將X的行數賦值給num_train
loss = 0.0
for i in range(num_train):
    scores = X[i].dot(W) #每個樣本Xi點乘W,結果大小為(1,C),也就是算出了這個樣本每個分類的得分
    correct_class_score = scores[y[i]] #正確的那個類別的得分
    for j in range(num_classes):
        if j == y[i]:
            continue
        margin = scores[j] - correct_class_score + 1 # note delta = 1
        if margin > 0:
            loss += margin #計算loss

            dW [:,j] += X[i,:].T
            dW [:,y[i]] += -X[i,:].T #更新梯度

# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train
dW /= num_train #求平均值

# Add regularization to the loss.
loss += reg * np.sum(W * W)
dW += reg * W #添加正則化

#############################################################################
# TODO:                                                                     #
# Compute the gradient of the loss function and store it dW.                #
# Rather than first computing the loss and then computing the derivative,   #
# it may be simpler to compute the derivative at the same time that the     #
# loss is being computed. As a result you may need to modify some of the    #
# code above to compute the gradient.                                       #
#############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

pass

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

return loss, dW

def svm_loss_vectorized(W, X, y, reg): #向量化 """ Structured SVM loss function, vectorized implementation.

Inputs and outputs are the same as svm_loss_naive.
"""
loss = 0.0
dW = np.zeros(W.shape) # initialize the gradient as zero

#############################################################################
# TODO:                                                                     #
# Implement a vectorized version of the structured SVM loss, storing the    #
# result in loss.                                                           #
#############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

num_train = X.shape[0]  #500
scores = np.dot(X, W) #點乘,得到評分
#print(scores.shape) #(500,10)
correct_class_scores = scores[np.arange(num_train), y] #變成了 (num_train,y)的矩陣
correct_class_scores = np.reshape(correct_class_scores, (num_train, -1))
#print(correct_class_scores.shape)  # (500,1)

margin = scores - correct_class_scores + 1.0
margin[np.arange(num_train), y] = 0.0 #把所有y的位置置0
margin[margin <= 0] = 0.0  #  max()公式的實現

loss += np.sum(margin) / num_train #計算loss
loss += 0.5 * reg * np.sum(W * W)
pass

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

#############################################################################
# TODO:                                                                     #
# Implement a vectorized version of the gradient for the structured SVM     #
# loss, storing the result in dW.                                           #
#                                                                           #
# Hint: Instead of computing the gradient from scratch, it may be easier    #
# to reuse some of the intermediate values that you used to compute the     #
# loss.                                                                     #
#############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

margin[margin > 0] = 1.0

row_sum = np.sum(margin, axis=1)
margin[np.arange(num_train), y] = -row_sum

dW = 1.0 / num_train * np.dot(X.T, margin) + reg * W
pass

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

return loss, dW

``` ↑相關注釋已經寫在代碼裏了

svm.ipynb(中)

接下來我們回到 svm.ipynb 繼續完成剩餘的工作

```python

Evaluate the naive implementation of the loss we provided for you:

from cs231n.classifiers.linear_svm import svm_loss_naive import time

generate a random SVM weight matrix of small numbers

W = np.random.randn(3073, 10) * 0.0001

loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.000005) print('loss: %f' % (loss, )) ``` ↑隨機初始化權重W,計算loss,結果如下:

loss: 9.503077

上面的梯度全為0,下面用作業給出的函數計算並做梯度檢查

The grad returned from the function above is right now all zero. Derive and implement the gradient for the SVM cost function and implement it inline inside the function svm_loss_naive. You will find it helpful to interleave your new code inside the existing function. \ To check that you have correctly implemented the gradient correctly, you can numerically estimate the gradient of the loss function and compare the numeric estimate to the gradient that you computed. We have provided code that does this for you:

```python

Once you've implemented the gradient, recompute it with the code below

and gradient check it with the function we provided for you

Compute the loss and its gradient at W.

loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)

Numerically compute the gradient along several randomly chosen dimensions, and

compare them with your analytically computed gradient. The numbers should match

almost exactly along all dimensions.

from cs231n.gradient_check import grad_check_sparse f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0] grad_numerical = grad_check_sparse(f, W, grad)

do the gradient check once again with regularization turned on

you didn't forget the regularization gradient did you?

loss, grad = svm_loss_naive(W, X_dev, y_dev, 5e1) f = lambda w: svm_loss_naive(w, X_dev, y_dev, 5e1)[0] grad_numerical = grad_check_sparse(f, W, grad) ``` ↑最終的計算結果如下:

numerical: 53.633753 analytic: 53.633753, relative error: 8.962307e-13 \ numerical: -11.118762 analytic: -11.118762, relative error: 4.871283e-11 \ numerical: -31.652148 analytic: -31.652148, relative error: 5.118703e-12 \ numerical: -37.148900 analytic: -37.148900, relative error: 2.075841e-12 \ numerical: 4.209371 analytic: 4.209371, relative error: 2.167553e-11 \ numerical: 10.116674 analytic: 10.116674, relative error: 3.401823e-12 \ numerical: -21.324854 analytic: -21.324854, relative error: 7.738055e-12 \ numerical: 1.329942 analytic: 1.329942, relative error: 6.063350e-11 \ numerical: -4.763931 analytic: -4.763931, relative error: 4.636437e-11 \ numerical: -11.882003 analytic: -11.882003, relative error: 2.569109e-11 \ numerical: -8.790902 analytic: -8.783125, relative error: 4.425420e-04 \ numerical: 9.180725 analytic: 9.176687, relative error: 2.199900e-04 \ numerical: 5.605164 analytic: 5.599004, relative error: 5.497384e-04 \ numerical: 42.522387 analytic: 42.526500, relative error: 4.835795e-05 \ numerical: 18.055843 analytic: 18.051024, relative error: 1.334777e-04 \ numerical: -22.156250 analytic: -22.153348, relative error: 6.549059e-05 \ numerical: 4.417888 analytic: 4.412968, relative error: 5.571097e-04 \ numerical: 12.868772 analytic: 12.876475, relative error: 2.992280e-04 \ numerical: 35.825513 analytic: 35.818245, relative error: 1.014356e-04 \ numerical: 16.295317 analytic: 16.293690, relative error: 4.993667e-05

Inline Question 1 \ It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? How would change the margin affect of the frequency of this happening? Hint: the SVM loss function is not strictly speaking differentiable

這裏是第一個小問題,是説在某個維度中的梯度檢查可能不能精確匹配,產生差異的原因是什麼,如何優化這種邊際效應問題。提示:SVM 嚴格來説是不可微的 然後我在GitHub上找到了一個答案: 在這裏插入圖片描述

```python

Next implement the function svm_loss_vectorized; for now only compute the loss;

we will implement the gradient in a moment.

tic = time.time() loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005) toc = time.time() print('Naive loss: %e computed in %fs' % (loss_naive, toc - tic))

from cs231n.classifiers.linear_svm import svm_loss_vectorized tic = time.time() loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.000005) toc = time.time() print('Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic))

The losses should match but your vectorized implementation should be much faster.

print('difference: %f' % (loss_naive - loss_vectorized)) ``` ↑向量化計算,同時求出運行時間:

Naive loss: 9.503077e+00 computed in 0.166552s \ Vectorized loss: 9.503077e+00 computed in 0.004987s \ difference: 0.000000

```python

Complete the implementation of svm_loss_vectorized, and compute the gradient

of the loss function in a vectorized way.

The naive implementation and the vectorized implementation should match, but

the vectorized version should still be much faster.

tic = time.time() _, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005) toc = time.time() print('Naive loss and gradient: computed in %fs' % (toc - tic))

tic = time.time() _, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.000005) toc = time.time() print('Vectorized loss and gradient: computed in %fs' % (toc - tic))

The loss is a single number, so it is easy to compare the values computed

by the two implementations. The gradient on the other hand is a matrix, so

we use the Frobenius norm to compare them.

difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro') print('difference: %f' % difference) ``` ↑梯度計算向量化:

Naive loss and gradient: computed in 0.159082s \ Vectorized loss and gradient: computed in 0.004987s \ difference: 0.000000

隨機梯度下降

Stochastic Gradient Descent \ We now have vectorized and efficient expressions for the loss, the gradient and our gradient matches the numerical gradient. We are therefore ready to do SGD to minimize the loss. Your code for this part will be written inside cs231n/classifiers/linear_classifier.py.

現在通過隨機梯度下降(SGD)來降低loss,進入linear_classifier.py編寫代碼

首先讓我們先簡單複習一下SGD,GD(梯度下降)每次更新都使用權不得訓練數據進行迭代更新: $$ \theta_j:=\theta_j-\alpha\dfrac{\partial}{\partial\theta_j}J(\theta) $$ 然而SGD每次迭代只是用一個訓練數據進行迭代計算,因此速度也更快: $$ \theta_j:=\theta_j+\alpha(y^{(i)}-h_\theta(x^{(i)}))x_j^{(i)}. $$ 下面我們進入到code中看一下它是怎麼運行的:

```python from future import print_function

from builtins import range from builtins import object import numpy as np from cs231n.classifiers.linear_svm import * from cs231n.classifiers.softmax import * from past.builtins import xrange

class LinearClassifier(object):

def __init__(self):
    self.W = None

def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100,
          batch_size=200, verbose=False):
    """
    Train this linear classifier using stochastic gradient descent.

    Inputs:
    - X: A numpy array of shape (N, D) containing training data; there are N
      training samples each of dimension D.
    - y: A numpy array of shape (N,) containing training labels; y[i] = c
      means that X[i] has label 0 <= c < C for C classes.
    - learning_rate: (float) learning rate for optimization.
    - reg: (float) regularization strength.
    - num_iters: (integer) number of steps to take when optimizing
    - batch_size: (integer) number of training examples to use at each step.
    - verbose: (boolean) If true, print progress during optimization.

    Outputs:
    A list containing the value of the loss function at each training iteration.
    """
    num_train, dim = X.shape
    num_classes = np.max(y) + 1 # assume y takes values 0...K-1 where K is number of classes
    if self.W is None:
        # lazily initialize W
        self.W = 0.001 * np.random.randn(dim, num_classes)

    # Run stochastic gradient descent to optimize W
    loss_history = []
    for it in range(num_iters):
        X_batch = None
        y_batch = None

        #########################################################################
        # TODO:                                                                 #
        # Sample batch_size elements from the training data and their           #
        # corresponding labels to use in this round of gradient descent.        #
        # Store the data in X_batch and their corresponding labels in           #
        # y_batch; after sampling X_batch should have shape (batch_size, dim)   #
        # and y_batch should have shape (batch_size,)                           #
        #                                                                       #
        # Hint: Use np.random.choice to generate indices. Sampling with         #
        # replacement is faster than sampling without replacement.              #
        #########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        sample_index = np.random.choice(num_train,batch_size,replace=False)
        X_batch = X[sample_index,:]
        y_batch = y[sample_index]
        pass

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        # evaluate loss and gradient
        loss, grad = self.loss(X_batch, y_batch, reg)
        loss_history.append(loss)

        # perform parameter update
        #########################################################################
        # TODO:                                                                 #
        # Update the weights using the gradient and the learning rate.          #
        #########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        self.W = self.W - learning_rate*grad
        pass

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        if verbose and it % 100 == 0:
            print('iteration %d / %d: loss %f' % (it, num_iters, loss))

    return loss_history

def predict(self, X):
    """
    Use the trained weights of this linear classifier to predict labels for
    data points.

    Inputs:
    - X: A numpy array of shape (N, D) containing training data; there are N
      training samples each of dimension D.

    Returns:
    - y_pred: Predicted labels for the data in X. y_pred is a 1-dimensional
      array of length N, and each element is an integer giving the predicted
      class.
    """
    y_pred = np.zeros(X.shape[0])
    ###########################################################################
    # TODO:                                                                   #
    # Implement this method. Store the predicted labels in y_pred.            #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    score = X.dot(self.W)
    y_pred = np.argmax(score,axis=1)
    pass

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    return y_pred

def loss(self, X_batch, y_batch, reg):
    """
    Compute the loss function and its derivative.
    Subclasses will override this.

    Inputs:
    - X_batch: A numpy array of shape (N, D) containing a minibatch of N
      data points; each point has dimension D.
    - y_batch: A numpy array of shape (N,) containing labels for the minibatch.
    - reg: (float) regularization strength.

    Returns: A tuple containing:
    - loss as a single float
    - gradient with respect to self.W; an array of the same shape as W
    """
    pass

class LinearSVM(LinearClassifier): """ A subclass that uses the Multiclass SVM loss function """

def loss(self, X_batch, y_batch, reg):
    return svm_loss_vectorized(self.W, X_batch, y_batch, reg)

class Softmax(LinearClassifier): """ A subclass that uses the Softmax + Cross-entropy loss function """

def loss(self, X_batch, y_batch, reg):
    return softmax_loss_vectorized(self.W, X_batch, y_batch, reg)

```

svm.ipynb(下)

我們繼續回到SVM作業

```python

In the file linear_classifier.py, implement SGD in the function

LinearClassifier.train() and then run it with the code below.

from cs231n.classifiers import LinearSVM svm = LinearSVM() tic = time.time() loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=2.5e4, num_iters=1500, verbose=True) toc = time.time() print('That took %fs' % (toc - tic)) ``` ↑進行梯度下降:

iteration 0 / 1500: loss 404.803504 \ iteration 100 / 1500: loss 240.999746 \ iteration 200 / 1500: loss 147.223258 \ iteration 300 / 1500: loss 90.523987 \ iteration 400 / 1500: loss 57.061188 \ iteration 500 / 1500: loss 34.765473 \ iteration 600 / 1500: loss 23.564806 \ iteration 700 / 1500: loss 16.603101 \ iteration 800 / 1500: loss 11.401976 \ iteration 900 / 1500: loss 9.060058 \ iteration 1000 / 1500: loss 7.689659 \ iteration 1100 / 1500: loss 6.790426 \ iteration 1200 / 1500: loss 5.630356 \ iteration 1300 / 1500: loss 5.295763 \ iteration 1400 / 1500: loss 5.502182 \ That took 12.969263s

```python

A useful debugging strategy is to plot the loss as a function of

iteration number:

plt.plot(loss_hist) plt.xlabel('Iteration number') plt.ylabel('Loss value') plt.show() ``` 畫出圖表:

image.png

```python

Write the LinearSVM.predict function and evaluate the performance on both the

training and validation set

y_train_pred = svm.predict(X_train) print('training accuracy: %f' % (np.mean(y_train == y_train_pred), )) y_val_pred = svm.predict(X_val) print('validation accuracy: %f' % (np.mean(y_val == y_val_pred), )) ``` ↑預測函數,並評估準確率:

training accuracy: 0.379204 \ validation accuracy: 0.388000

```python

Use the validation set to tune hyperparameters (regularization strength and

learning rate). You should experiment with different ranges for the learning

rates and regularization strengths; if you are careful you should be able to

get a classification accuracy of about 0.39 on the validation set.

Note: you may see runtime/overflow warnings during hyper-parameter search.

This may be caused by extreme values, and is not a bug.

results is dictionary mapping tuples of the form

(learning_rate, regularization_strength) to tuples of the form

(training_accuracy, validation_accuracy). The accuracy is simply the fraction

of data points that are correctly classified.

results = {} best_val = -1 # The highest validation accuracy that we have seen so far. best_svm = None # The LinearSVM object that achieved the highest validation rate.

TODO:

Write code that chooses the best hyperparameters by tuning on the validation

set. For each combination of hyperparameters, train a linear SVM on the

training set, compute its accuracy on the training and validation sets, and

store these numbers in the results dictionary. In addition, store the best

validation accuracy in best_val and the LinearSVM object that achieves this

accuracy in best_svm.

Hint: You should use a small value for num_iters as you develop your

validation code so that the SVMs don't take much time to train; once you are

confident that your validation code works, you should rerun the validation

code with a larger value for num_iters.

Provided as a reference. You may or may not want to change these hyperparameters

learning_rates = [1e-7, 5e-5] regularization_strengths = [2.5e4, 5e4]

*START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***

for rs in regularization_strengths: for lr in learning_rates: svm = LinearSVM() loss_hist = svm.train(X_train, y_train, lr, rs, num_iters=3000) y_train_pred = svm.predict(X_train) train_accuracy = np.mean(y_train == y_train_pred) y_val_pred = svm.predict(X_val) val_accuracy = np.mean(y_val == y_val_pred) if val_accuracy > best_val: best_val = val_accuracy best_svm = svm
results[(lr,rs)] = train_accuracy, val_accuracy pass

*END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***

Print out results.

for lr, reg in sorted(results): train_accuracy, val_accuracy = results[(lr, reg)] print('lr %e reg %e train accuracy: %f val accuracy: %f' % ( lr, reg, train_accuracy, val_accuracy))

print('best validation accuracy achieved during cross-validation: %f' % best_val) ``` ↑選擇最佳的學習率和正則化參數:

lr 1.000000e-07 reg 2.500000e+04 train accuracy: 0.380592 val accuracy: 0.384000 \ lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.367898 val accuracy: 0.375000 \ lr 5.000000e-05 reg 2.500000e+04 train accuracy: 0.146327 val accuracy: 0.142000 \ lr 5.000000e-05 reg 5.000000e+04 train accuracy: 0.100265 val accuracy: 0.087000 \ best validation accuracy achieved during cross-validation: 0.384000

```python

Visualize the cross-validation results

import math import pdb

pdb.set_trace()

x_scatter = [math.log10(x[0]) for x in results] y_scatter = [math.log10(x[1]) for x in results]

plot training accuracy

marker_size = 100 colors = [results[x][0] for x in results] plt.subplot(2, 1, 1) plt.tight_layout(pad=3) plt.scatter(x_scatter, y_scatter, marker_size, c=colors, cmap=plt.cm.coolwarm) plt.colorbar() plt.xlabel('log learning rate') plt.ylabel('log regularization strength') plt.title('CIFAR-10 training accuracy')

plot validation accuracy

colors = [results[x][1] for x in results] # default size of markers is 20 plt.subplot(2, 1, 2) plt.scatter(x_scatter, y_scatter, marker_size, c=colors, cmap=plt.cm.coolwarm) plt.colorbar() plt.xlabel('log learning rate') plt.ylabel('log regularization strength') plt.title('CIFAR-10 validation accuracy') plt.show() ``` ↑可視化: 在這裏插入圖片描述

```python

Evaluate the best svm on test set

y_test_pred = best_svm.predict(X_test) test_accuracy = np.mean(y_test == y_test_pred) print('linear SVM on raw pixels final test set accuracy: %f' % test_accuracy) ``` ↑最終準確率:

linear SVM on raw pixels final test set accuracy: 0.368000

```python

Visualize the learned weights for each class.

Depending on your choice of learning rate and regularization strength, these may

or may not be nice to look at.

w = best_svm.W[:-1,:] # strip out the bias w = w.reshape(32, 32, 3, 10) w_min, w_max = np.min(w), np.max(w) classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] for i in range(10): plt.subplot(2, 5, i + 1)

# Rescale the weights to be between 0 and 255
wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
plt.imshow(wimg.astype('uint8'))
plt.axis('off')
plt.title(classes[i])

``` 在這裏插入圖片描述