机器学习中的一些 tricks

L2正则化的数学原理

L2正则化:

To avoid parameters from exploding or becoming highly correlated, it is helpful to augment our cost function with a Gaussian prior: this tends to push parameter weights closer to zero, without constraining their direction, and often leads to classifiers with better generalization ability.

If we maximize log-likelihood (as with the cross-entropy loss, above), then the Gaussian prior becomes a quadratic term 1 (L2 regularization): \[J_{reg}(\theta)=\dfrac{\lambda}{2}[\sum_{i,j}{W_1}_{i,j}^2+\sum_{i'j'}{W_2}_{i,j}^2]\]

可以证明:  \[W_{ij} ∼ N (0; 1=λ)\] 从两种角度理解正则化:知乎

RNN为什么容易出现梯度消失和梯度爆炸问题

relu为啥能有效的解决梯度消失的问题

很难理解为啥用relu能很好的解决梯度消失的问题,的确relu的梯度为1,但这也太简单了吧。。。所以得看看原论文 A Simple Way to Initialize Recurrent Networks of Rectified Linear Units