CS-233: SVM - theory question

Hi,

I have two theoretical questions about SVM :

I have seen that coefficients a_n should be equal to C inside the margin and less than C on the margin. However I don't understand why/how points on the same margin might have different value of a_n.
Also, I don't understand why margin can be assymetrical ? (In class, we have seen that the margin could be rescaled to be 1 in both side, so I thought that there should be symmetrical...)

Thanks in advance for your attention !

Re: SVM - theory question

by Vidit Vidit - Friday, 17 May 2019, 11:31 AM

Hello,
1.I have seen that coefficients a_n should be equal to C inside the margin and less than C on the margin. However I don’t understand why/how points on the same margin might have different value of a_n.

in regards to, how you achieve the coefficients a_n, I would like to point to Bishop book section 7.1.1 for the derivation. For that you will have to b know about Langrangian Multipliers. Please have look about it here: http://cs229.stanford.edu/notes/cs229-notes3.pdf Section5.
We have SVM loss as
$\min_{\mathbf{w}}\frac{1}{2}||\mathbf{w}||^2 + C \sum_{n}\xi_{n}$ and the constraints with
1> $\xi_{n}>0$ when points are beyond margins but on correct side of decision boundary;
$\xi_{n}$ >1 when points are on wrong side of decision boundary and

$\xi_{n}=0$ when on margin or right side of the decision and margin boundary,

one can reformulate it as
$\min_{\mathbf{w}}\frac{1}{2}||\mathbf{w}||^2 + C\sum_{n}\max(0,1-t_n\mathbf{y_n})$ , where, $t_n\in\{-1,1\}$ and $\mathbf{y_n}=\mathbf{w}^T\mathbf{x_n} +b$
$\min_{\mathbf{w}}\frac{1}{2}||\mathbf{w}||^2 + \sum_{n}\max(0,C(1-t_n\mathbf{y_n}))$ (2)
$\min_{\mathbf{w}}\frac{1}{2}||\mathbf{w}||^2 + \sum_{n}\max_{a_n\in[0,C]}a_n(1-t_n\mathbf{y_n})$ (3)
$\min_{\mathbf{w}} \max_{a_n\in[0,C]} \frac{1}{2}||\mathbf{w}||^2 + \sum_{n}a_n(1-t_n\mathbf{y_n})$ (4)
so when we have points on the right side of margin $(1-t_n\mathbf{y_n})$ is negative and from (3), max value is achieved when $a_n$ = 0
when the point is wrong side of margin $(1-t_n\mathbf{y_n})$ is positive and from (3), max value is achieved when $a_n$ = C
when the point is on the margin $(1-t_n\mathbf{y_n})$ is zero and from (3), max value is achieved when $a_n \in [0, C]}$ , but since in our formulation gives $\mathbf{w} = \sum_{n}a_nt_nx_n$ and we want $t_ny_n=1$ points on margin, then they would restrict $a_n\in (0,C)$ .
$a_n$ can be thought as how much you want your loss to change if this point moves from wrong side of margin to right side.

2 . Margins are defined as
$t_ny_n = 1$ , which ensures symmetry of margin as the distance between two points on margin in direction $\mathbf{w}$ is $\frac{2}{||\mathbf{w}||$ . The catch here is that if you use some feature expansion function $\phi(.)$ and project original point $x_n \in \mathbb{R}^D$ to $\mathbb{R}^M$ space. You will have the margins symmetrical in $\mathbb{R}^M$ but may not in $\mathbb{R}^D$ . Remember if you use feature expansion $y_n=\mathbf{w}^T\phi(x_n)+b$ .

Hope this helps
Cheers.

Re: SVM - theory question

by Ghali Chraibi - Friday, 17 May 2019, 2:50 PM

Thanks a lot for your answer !