L2正则化的数学原理

L2正则化：

To avoid parameters from exploding or becoming highly correlated, it is helpful to augment our cost function with a Gaussian prior: this tends to push parameter weights closer to zero, without constraining their direction, and often leads to classifiers with better generalization ability.

If we maximize log-likelihood (as with the cross-entropy loss, above), then the Gaussian prior becomes a quadratic term 1 (L2 regularization): $J_{reg}(\theta)=\dfrac{\lambda}{2}[\sum_{i,j}{W_1}_{i,j}^2+\sum_{i'j'}{W_2}_{i,j}^2]$