代码实现高斯混合模型

sklearn源码阅读,用em算法计算高斯混合模型GMM

代码实现高斯混合模型

参考这篇博客Regularized Gaussian Covariance Estimation非常值得一读,同事这篇博客很深入的讲了协方差怎么求的问题,在前文中我也有提到~但我解释的很low。。

代码直接就看sklearn里面的源码吧~网上很多不靠谱。。。 github源码

类初始化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class GaussianMixture(BaseMixture):
"""
Representation of a Gaussian mixture model probability distribution.
This class allows to estimate the parameters of a Gaussian mixture
distribution. 对混合的高斯分布进行参数估计~
"""
def __init__(self, n_components=1, covariance_type='full', tol=1e-3,
reg_covar=1e-6, max_iter=100, n_init=1, init_params='kmeans',
weights_init=None, means_init=None, precisions_init=None,
random_state=None, warm_start=False,
verbose=0, verbose_interval=10):
super(GaussianMixture, self).__init__(
n_components=n_components, tol=tol, reg_covar=reg_covar,
max_iter=max_iter, n_init=n_init, init_params=init_params,
random_state=random_state, warm_start=warm_start,
verbose=verbose, verbose_interval=verbose_interval)

# 主要是针对3个要学习的参数的初始化
self.covariance_type = covariance_type # 协方差矩阵形式
self.weights_init = weights_init # 多项式分布,每一类的概率
self.means_init = means_init # 均值 (n_components, n_features)
self.precisions_init = precisions_init # 协方差

先看初始化构造函数,参数是真的多。。。

  • n_components=1: The number of mixture components.表示混合类别的个数,也就是混合高斯分布的个数
  • covariance_type='full': 协方差矩阵的类型。{'full', 'tied', 'diag', 'spherical'} 分别对应完全协方差矩阵(元素都不为零),相同的完全协方差矩阵(HMM会用到),对角协方差矩阵(非对角为零,对角不为零),球面协方差矩阵(非对角为零,对角完全相同,球面特性),默认‘full’ 完全协方差矩阵
  • tol=1e-3 收敛阈值,EM iterations will stop when the lower bound average gain is below this threshold.也就是当下界的平均增益小于阈值时,em迭代就停止。这里的下界指的是公式 (3)中的下界凸函数。我们知道em算法分两步,e step是期望,也就是不等式相等,m setp是最大化, 也就是下界凸函数最大化。这里的阈值平均增益就是指凸函数的最大化过程中的增益。
  • reg_covar=1e-6: Non-negative regularization added to the diagonal of covariance.Allows to assure that the covariance matrices are all positive. 非负正则化添加到协方差矩阵对角线上,保证协方差矩阵都是正定的。
  • max_iter=100: em算法的最大迭代次数
  • n_init: int, defaults to 1.初始化的次数
  • init_params: {'kmeans', 'random'}, defaults to 'kmeans'. The method used to initialize the weights, the means and the precisionsself. Must be one of::
    • 'kmeans' : responsibilities are initialized using kmeans.
    • 'random' : responsibilities are initialized randomly.
    • 这里对应的初始化,是指的隐藏变量z的分类所占比例,也就是weight_init,kmeans表示“hard”guess, {0, 1} or {1, . . . , k}) random应该就是"soft"guess吧。
  • weights_init : shape (n_components, ), optional The user-provided initial weights, defaults to None. If it None, weights are initialized using the init_params method.

先验权重初始化,对应的就是隐藏变量有n_components类,而每一类所占的比例,也就是多项式分布的初始化~对应\(\phi_i\) - means_init : array-like, shape (n_components, n_features), optional. The user-provided initial means, defaults to None, If it None, means are initialized using the init_params method.混合高斯分布的均值初始化,注意shape=(n_components, n_features),有n_components这样的多维高斯分布,每个高斯分布有n_features维度

  • precisions_init : The user-provided initial precisions (inverse of the covariance matrices), defaults to None. If it None, precisions are initialized using the 'init_params' method.The shape depends on 'covariance_type'::
  • (n_components,) if 'spherical',
  • (n_features, n_features) if 'tied',
  • (n_components, n_features) if 'diag',
  • (n_components, n_features, n_features) if 'full'
  • 用来初始化高斯分布中的协方差矩阵,协方差矩阵代表的是n_features维向量中每一维特征与其他维度特征的关系,对于一个高斯分布来说是n_featuresn_features,n_components个混合也就是'full'。其中要学习的参数个数是(n_features+1) n_features/2.具体关于协方差矩阵参考前面那篇博客

  • random_state : int, RandomState instance or None, optional (default=None) 随机数生成器

  • warm_start : bool, default to False.If 'warm_start' is True, the solution of the last fitting is used as initialization for the next call of fit(). This can speed up convergence when fit is called several times on similar problems. 若为True,则fit()调用会以上一次fit()的结果作为初始化参数,适合相同问题多次fit的情况,能加速收敛,默认为False。

  • verbose : int, default to 0. Enable verbose output. If 1 then it prints the current initialization and each iteration step. If greater than 1 then it prints also the log probability and the time needed for each step. 使能迭代信息显示,默认为0,可以为1或者大于1(显示的信息不同)

  • verbose_interval: 与13挂钩,若使能迭代信息显示,设置多少次迭代后显示信息,默认10次。

E step

就是求\(w_j^i\)

1
2
3
4
5
6
7
8
9
10
11
12
def _e_step(self, X):
"""E step.
Parameters
----------
X : array-like, shape (n_samples, n_features)
Returns
-------
log_prob_norm :
log_responsibility : 后验概率,样本i是j类的概率w_j^{i}
"""
log_prob_norm, log_resp = self._estimate_log_prob_resp(X)
return np.mean(log_prob_norm), log_resp

那么如何求\(w_j^{i}\)呢?

\[w_j^{(i)}:=p(z^{(i)}=j|x^{(i)};\phi,\mu,\Sigma)=\dfrac{p(x^{(i)}|z^{(i)}=j;\mu,\Sigma)p(z^{(i)}=j;\phi)}{\sum_{l=1}^kp(x^{(i)}|z^{(i)}=l;\mu,\Sigma)p(z^{(i)}=l;\phi)}\]

要注意的是,为了计算方便,有以下几点:

  • 因为分子分母中设计到正态分布,即指数形式,故先计算其log形式。然后带入到M step中取回指数形式即可。
  • 对于协方差矩阵,如果n_features很大的话,计算其逆矩阵和行列式就很复杂,因此可以先计算其precision矩阵,然后进行cholesky分解,以便优化计算。

先计算分子对数形式,两个对数相加:

\[logp(x^{(i)}|z^{(i)}=j;\mu,\Sigma)+logp(z^{(i)}=j;\phi)\]

1
2
3
4
5
6
7
8
9
10
11
# 计算P(x|z)p(z)的对数形式
def _estimate_weighted_log_prob(self, X):
"""Estimate the weighted log-probabilities, log P(X | Z) + log weights.
Parameters
----------
X : array-like, shape (n_samples, n_features)
Returns
-------
weighted_log_prob : array, shape (n_samples, n_component)
"""
return self._estimate_log_prob(X) + self._estimate_log_weights()

  1. 其中前者是高斯分布概率的对数,根据均值,协方差矩阵的cholesky分解可求得。
    1
    2
    3
    def _estimate_log_prob(self, X):
    return _estimate_log_gaussian_prob(
    X, self.means_, self.precisions_cholesky_, self.covariance_type)

这个函数,

1
2
3
4
5
6

2. 后者是每一类高斯分布所占的权重,也就是$\phi_j$
```python
def _estimate_log_weights(self):
# 刚开始是初始值,后面随着m step而更新
return np.log(self.weights_)

再计算\(w_j^i\)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def _estimate_log_prob_resp(self, X):
"""Estimate log probabilities and responsibilities for each sample.
Compute the log probabilities, weighted log probabilities per
component and responsibilities for each sample in X with respect to
the current state of the model.
Parameters
----------
X : array-like, shape (n_samples, n_features)
Returns
-------
log_prob_norm : array, shape (n_samples,)
log p(X)
log_responsibilities : array, shape (n_samples, n_components)
logarithm of the responsibilities
"""
# 计算分子log P(X | Z) + log weights.
weighted_log_prob = self._estimate_weighted_log_prob(X)
# 计算分母log P(x)
log_prob_norm = logsumexp(weighted_log_prob, axis=1)
with np.errstate(under='ignore'):
# 忽略下溢,计算log(w_J^j),也就是两个对数相减
log_resp = weighted_log_prob - log_prob_norm[:, np.newaxis]
return log_prob_norm, log_resp

M setp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def _m_step(self, X, log_resp):
"""M step.
Parameters
----------
X : array-like, shape (n_samples, n_features)
log_resp : array-like, shape (n_samples, n_components)
Logarithm of the posterior probabilities (or responsibilities) of
the point of each sample in X.
"""
n_samples, _ = X.shape
# 根据E step中求得的log_resp,更新权重,均值和协方差
self.weights_, self.means_, self.covariances_ = (
_estimate_gaussian_parameters(X, np.exp(log_resp), self.reg_covar,
self.covariance_type))
# 更新类别权重phi_j
self.weights_ /= n_samples
# 更新协方差矩阵的精度矩阵
self.precisions_cholesky_ = _compute_precision_cholesky(
self.covariances_, self.covariance_type)

具体怎么求,就是根据前面推导的公式了。根据前面的公式分别求对应的估计参数: \[\Sigma_j:=\dfrac{\sum_{i=1}^mw_j^{(i)}(x^{(i)}-\mu_j)(x^{(i)}-\mu_j)^T}{\sum_{i=1}^mw_j^{(i)}}\]

协方差矩阵:以‘full’为例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def _estimate_gaussian_covariances_full(resp, X, nk, means, reg_covar):
"""Estimate the full covariance matrices.
resp:表示E step中猜测的w_j^{i}
"""
n_components, n_features = means.shape
# 协方差矩阵
covariances = np.empty((n_components, n_features, n_features))
for k in range(n_components):
diff = X - means[k]
covariances[k] = np.dot(resp[:, k] * diff.T, diff) / nk[k]
# 正则化,flat表示展开成一维,然后每隔n_features取一个元素,单个协方差矩阵shape是
# [n_features, n_features],所以就是对角线元素加上reg_covar
covariances[k].flat[::n_features + 1] += reg_covar
return covariances

然后是正态分布的参数估计\(u_j, \phi_j\)

\[\phi_j:=\frac{1}{m}\sum_{i=1}^mw_j^{(i)}\]

\[\mu_j:=\dfrac{\sum_{i=1}^mw_j^{(i)}x^{(i)}}{\sum_{i=1}^mw_j^{(i)}}\]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def _estimate_gaussian_parameters(X, resp, reg_covar, covariance_type):
"""Estimate the Gaussian distribution parameters.
Parameters
----------
X : 样本数据 (n_samples, n_features)
resp : Estep猜测的样本i是j类的概率w_i^{j}, shape (n_samples, n_components)
reg_covar : 对角线正则化项
covariance_type : {'full', 'tied', 'diag', 'spherical'}
Returns
-------
nk : 当前类别下的样本和 (n_components,) 也就是\sum_i^{m}(w_j^{i})
means : k个n维正态分布的均值, shape (n_components, n_features)
covariances : 协方差矩阵
"""
# 因为要做分母,避免为0
nk = resp.sum(axis=0) + 10 * np.finfo(resp.dtype).eps
means = np.dot(resp.T, X) / nk[:, np.newaxis]
covariances = {"full": _estimate_gaussian_covariances_full,
"tied": _estimate_gaussian_covariances_tied,
"diag": _estimate_gaussian_covariances_diag,
"spherical": _estimate_gaussian_covariances_spherical
}[covariance_type](resp, X, nk, means, reg_covar)
return nk, means, covariances

迭代收敛,重复以上过程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
def fit(self, X, y=None):
"""Estimate model parameters with the EM algorithm.

The method fit the model `n_init` times and set the parameters with
which the model has the largest likelihood or lower bound. Within each
trial, the method iterates between E-step and M-step for `max_iter`
times until the change of likelihood or lower bound is less than
`tol`, otherwise, a `ConvergenceWarning` is raised.

迭代终止条件: 迭代次数n_init,极大似然函数或下界函数的增益小于`tol`

Parameters
----------
X : array-like, shape (n_samples, n_features)

Returns
-------
self
"""
X = _check_X(X, self.n_components)
self._check_initial_parameters(X)

# if we enable warm_start, we will have a unique initialisation
do_init = not(self.warm_start and hasattr(self, 'converged_'))
n_init = self.n_init if do_init else 1

max_lower_bound = -np.infty
self.converged_ = False

random_state = check_random_state(self.random_state)

n_samples, _ = X.shape
# 初始化次数
for init in range(n_init):
self._print_verbose_msg_init_beg(init)

# 先初始化参数
if do_init:
self._initialize_parameters(X, random_state)
self.lower_bound_ = -np.infty

# 迭代次数
for n_iter in range(self.max_iter):
prev_lower_bound = self.lower_bound_

# E step求出后验概率w_j^i或是Q分布
log_prob_norm, log_resp = self._e_step(X)
# M step更新参数
self._m_step(X, log_resp)
# 求出下界函数的最大值
self.lower_bound_ = self._compute_lower_bound(
log_resp, log_prob_norm)
# 下界函数的增益
change = self.lower_bound_ - prev_lower_bound
self._print_verbose_msg_iter_end(n_iter, change)

# 比较下界函数增益与tol
if abs(change) < self.tol:
self.converged_ = True
break

self._print_verbose_msg_init_end(self.lower_bound_)

#
if self.lower_bound_ > max_lower_bound:
max_lower_bound = self.lower_bound_
best_params = self._get_parameters()
best_n_iter = n_iter

if not self.converged_:
warnings.warn('Initialization %d did not converge. '
'Try different init parameters, '
'or increase max_iter, tol '
'or check for degenerate data.'
% (init + 1), ConvergenceWarning)

self._set_parameters(best_params)
self.n_iter_ = best_n_iter

return self

E step中p(x|z=j)

根据高斯分布的参数计算概率,优化的计算方法。

先计算协方差矩阵的precision矩阵,并进行cholesky分解 Precision matrix 协方差矩阵的逆矩阵:https://www.statlect.com/glossary/precision-matrix

然后根据精度矩阵的cholesky分解形式,这样可以优化矩阵运算

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def _compute_precision_cholesky(covariances, covariance_type):
"""Compute the Cholesky decomposition of the precisions.
Parameters
----------
covariances : array-like
The covariance matrix of the current components.
The shape depends of the covariance_type.
covariance_type : {'full', 'tied', 'diag', 'spherical'}
The type of precision matrices.
Returns
-------
precisions_cholesky : array-like
The cholesky decomposition of sample precisions of the current
components. The shape depends of the covariance_type.
"""
if covariance_type in 'full':
n_components, n_features, _ = covariances.shape
precisions_chol = np.empty((n_components, n_features, n_features))
for k, covariance in enumerate(covariances):
try:
cov_chol = linalg.cholesky(covariance, lower=True)
except linalg.LinAlgError:
raise ValueError(estimate_precision_error_message)
precisions_chol[k] = linalg.solve_triangular(cov_chol,
np.eye(n_features),
lower=True).T
elif covariance_type == 'tied':
_, n_features = covariances.shape
try:
cov_chol = linalg.cholesky(covariances, lower=True)
except linalg.LinAlgError:
raise ValueError(estimate_precision_error_message)
precisions_chol = linalg.solve_triangular(cov_chol, np.eye(n_features),
lower=True).T
else:
if np.any(np.less_equal(covariances, 0.0)):
raise ValueError(estimate_precision_error_message)
precisions_chol = 1. / np.sqrt(covariances)
return precisions_chol

计算cholesky分解的行列式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Gaussian mixture probability estimators
# 根据cholesky分解计算行列式的log
def _compute_log_det_cholesky(matrix_chol, covariance_type, n_features):
"""Compute the log-det of the cholesky decomposition of matrices.
Parameters
----------
matrix_chol : 协方差矩阵的cholesky分解
covariance_type : {'full', 'tied', 'diag', 'spherical'}
n_features : int
Number of features.
Returns
-------
log_det_precision_chol : array-like, shape (n_components,)
The determinant of the precision matrix for each component.
"""
if covariance_type == 'full':
n_components, _, _ = matrix_chol.shape
log_det_chol = (np.sum(np.log(
matrix_chol.reshape(
n_components, -1)[:, ::n_features + 1]), 1))

elif covariance_type == 'tied':
log_det_chol = (np.sum(np.log(np.diag(matrix_chol))))

elif covariance_type == 'diag':
log_det_chol = (np.sum(np.log(matrix_chol), axis=1))

else:
log_det_chol = n_features * (np.log(matrix_chol))

return log_det_chol

计算分子: \(logp(x^{(i)}|z^{(i)}=j;\mu,\Sigma)\)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def _estimate_log_gaussian_prob(X, means, precisions_chol, covariance_type):
"""Estimate the log Gaussian probability.
Parameters
----------
X : 样本数据(n_samples, n_features)
means : k个n维正态分布的均值(n_components, n_features)
precisions_chol : 精度矩阵的Cholesky分解
covariance_type : {'full', 'tied', 'diag', 'spherical'}
Returns
-------
log_prob : (n_samples, n_components)
"""
n_samples, n_features = X.shape
n_components, _ = means.shape
# det(precision_chol) is half of det(precision)
log_det = _compute_log_det_cholesky(
precisions_chol, covariance_type, n_features)

if covariance_type == 'full':
log_prob = np.empty((n_samples, n_components))
for k, (mu, prec_chol) in enumerate(zip(means, precisions_chol)):
y = np.dot(X, prec_chol) - np.dot(mu, prec_chol)
log_prob[:, k] = np.sum(np.square(y), axis=1)

elif covariance_type == 'tied':
pass

elif covariance_type == 'diag':
pass

elif covariance_type == 'spherical':
pass

return -.5 * (n_features * np.log(2 * np.pi) + log_prob) + log_det

sklearn中实例

Although GMM are often used for clustering, we can compare the obtained clusters with the actual classes from the dataset. We initialize the means of the Gaussians with the means of the classes from the training set to make this comparison valid.

We plot predicted labels on both training and held out test data using a variety of GMM covariance types on the iris dataset. We compare GMMs with spherical, diagonal, full, and tied covariance matrices in increasing order of performance. Although one would expect full covariance to perform best in general, it is prone to overfitting on small datasets and does not generalize well to held out test data.

On the plots, train data is shown as dots, while test data is shown as crosses. The iris dataset is four-dimensional. Only the first two dimensions are shown here, and thus some points are separated in other dimensions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.mixture import GaussianMixture
from sklearn.model_selection import StratifiedKFold

print(__doc__)

colors = ['navy', 'turquoise', 'darkorange']

iris = datasets.load_iris() # data, target

# Break up the dataset into non-overlapping training (75%) and testing
# (25%) sets.
skf = StratifiedKFold(n_splits=4)
# Only take the first fold.
train_index, test_index = next(iter(skf.split(iris.data, iris.target)))

X_train = iris.data[train_index] # (111, 4)
y_train = iris.target[train_index] # (111,)
X_test = iris.data[test_index] # (39, 4)
y_test = iris.target[test_index] # (39,)

n_classes = len(np.unique(y_train))

# Try GMMs using different types of covariances. 根据协方差矩阵,有4中不同的GMM模型
estimators = dict((cov_type, GaussianMixture(n_components=n_classes,
covariance_type=cov_type, max_iter=20, random_state=0))
for cov_type in ['spherical', 'diag', 'tied', 'full'])

n_estimators = len(estimators) # 4

# figsize表示图像的尺寸(width, height in inches)
plt.figure(figsize=(3 * 5 // 2, 6))
# 图像之间的间距
plt.subplots_adjust(bottom=.01, top=0.95, hspace=.15, wspace=.05,
left=.01, right=.99)

# 椭圆
def make_ellipses(gmm, ax):
for n, color in enumerate(colors):
if gmm.covariance_type == 'full':
covariances = gmm.covariances_[n][:2, :2]
elif gmm.covariance_type == 'tied':
covariances = gmm.covariances_[:2, :2]
elif gmm.covariance_type == 'diag':
covariances = np.diag(gmm.covariances_[n][:2])
elif gmm.covariance_type == 'spherical':
covariances = np.eye(gmm.means_.shape[1]) * gmm.covariances_[n]

# 参数估计得到的混合二维高斯分布,将其用椭圆表示出来~
v, w = np.linalg.eigh(covariances) # 返回协方差矩阵的特征值和列向量由特征矩阵构成的矩阵
u = w[0] / np.linalg.norm(w[0]) # order=None 表示 Frobenius norm,2-norm
angle = np.arctan2(u[1], u[0])
angle = 180 * angle / np.pi # 转换为角度
v = 2. * np.sqrt(2.) * np.sqrt(v)
ell = mpl.patches.Ellipse(gmm.means_[n, :2], v[0], v[1],
180 + angle, color=color)
ell.set_clip_box(ax.bbox)
ell.set_alpha(0.5)
ax.add_artist(ell) # 增加文字

for index, (name, estimator) in enumerate(estimators.items()):
# Since we have class labels for the training data, we can
# initialize the GMM parameters in a supervised manner.
# 这里因为有类标签,所以直接用真实均值来初始化GMM的均值。在无标签或者标签较少的情况下,则需要随机初始化
estimator.means_init = np.array([X_train[y_train == i].mean(axis=0)
for i in range(n_classes)])

# Train the other parameters using the EM algorithm.
# 用em算法来估计其他参数
estimator.fit(X_train)

# 画椭圆
h = plt.subplot(2, n_estimators // 2, index + 1)
make_ellipses(estimator, h)



for n, color in enumerate(colors):
data = iris.data[iris.target == n]
# 不同的种类数据用不同的点表示
plt.scatter(data[:, 0], data[:, 1], s=0.8, color=color,
label=iris.target_names[n])
# 用xx表示测试集
for n, color in enumerate(colors):
data = X_test[y_test == n]
plt.scatter(data[:, 0], data[:, 1], marker='x', color=color)

# 训练集的准确率
y_train_pred = estimator.predict(X_train) # 预测是选取概率最大的一类

# 当无标签时是没有办法计算准确率的。但是这里有标签,
# y_train_pred返回的是概率最大的索引, y_train的元素是[0,1,2,3]中的一个
# 因此可以求得准确率
print(y_train_pred[:5])
print(y_train[:5])
train_accuracy = np.mean(y_train_pred.ravel() == y_train.ravel()) * 100
plt.text(0.05, 0.9, 'Train accuracy: %.1f' % train_accuracy,
transform=h.transAxes)

y_test_pred = estimator.predict(X_test)
test_accuracy = np.mean(y_test_pred.ravel() == y_test.ravel()) * 100
plt.text(0.05, 0.8, 'Test accuracy: %.1f' % test_accuracy,
transform=h.transAxes)

plt.xticks(())
plt.yticks(())
plt.title(name)

plt.legend(scatterpoints=1, loc='lower right', prop=dict(size=12))


plt.show()