SVM - theory question

SVM - theory question

by Ghali Chraibi -
Number of replies: 2

Hi,

I have two theoretical questions about SVM :

  1. I have seen that coefficients a_n should be equal to C inside the margin and less than C on the margin. However I don't understand why/how points on the same margin might have different value of a_n.
  2. Also, I don't understand why margin can be assymetrical ? (In class, we have seen that the margin could be rescaled to be 1 in both side, so I thought that there should be symmetrical...)
Thanks in advance for your attention !
In reply to Ghali Chraibi

Re: SVM - theory question

by Vidit Vidit -

Hello,
1.I have seen that coefficients a_n should be equal to C inside the margin and less than C on the margin. However I don’t understand why/how points on the same margin might have different value of a_n.

  • in regards to, how you achieve the coefficients a_n, I would like to point to Bishop book section 7.1.1 for the derivation. For that you will have to b know about Langrangian Multipliers. Please have look about it here: http://cs229.stanford.edu/notes/cs229-notes3.pdf Section5.
  • We have SVM loss as
    \min_{\mathbf{w}}\frac{1}{2}||\mathbf{w}||^2 + C \sum_{n}\xi_{n} and the constraints with
    1>\xi_{n}>0 when points are beyond margins but on correct side of decision boundary;
    \xi_{n}>1 when points are on wrong side of decision boundary and

    \xi_{n}=0 when on margin or right side of the decision and margin boundary,

    one can reformulate it as
    \min_{\mathbf{w}}\frac{1}{2}||\mathbf{w}||^2 +  C\sum_{n}\max(0,1-t_n\mathbf{y_n}), where, t_n\in\{-1,1\} and \mathbf{y_n}=\mathbf{w}^T\mathbf{x_n} +b
    \min_{\mathbf{w}}\frac{1}{2}||\mathbf{w}||^2 +  \sum_{n}\max(0,C(1-t_n\mathbf{y_n}))(2)
    \min_{\mathbf{w}}\frac{1}{2}||\mathbf{w}||^2 +  \sum_{n}\max_{a_n\in[0,C]}a_n(1-t_n\mathbf{y_n}) (3)
    \min_{\mathbf{w}} \max_{a_n\in[0,C]} \frac{1}{2}||\mathbf{w}||^2 + \sum_{n}a_n(1-t_n\mathbf{y_n})(4)
    so when we have points on the right side of margin (1-t_n\mathbf{y_n}) is negative and from (3), max value is achieved when a_n = 0
    when the point is wrong side of margin (1-t_n\mathbf{y_n}) is positive and from (3), max value is achieved when a_n = C
    when the point is on the margin (1-t_n\mathbf{y_n}) is zero and from (3), max value is achieved when a_n \in [0, C]}, but since in our formulation gives \mathbf{w} = \sum_{n}a_nt_nx_n and we want t_ny_n=1 points on margin, then they would restrict a_n\in (0,C).
    a_n can be thought as how much you want your loss to change if this point moves from wrong side of margin to right side.

2 . Margins are defined as
t_ny_n = 1, which ensures symmetry of margin as the distance between two points on margin in direction \mathbf{w} is \frac{2}{||\mathbf{w}||. The catch here is that if you use some feature expansion function \phi(.) and project original point x_n \in \mathbb{R}^D to \mathbb{R}^M space. You will have the margins symmetrical in \mathbb{R}^M but may not in \mathbb{R}^D. Remember if you use feature expansion y_n=\mathbf{w}^T\phi(x_n)+b.

Hope this helps
Cheers.