rejection系列1-overview

关于 open set recognition 的一片综述。
paper: Recent Advances in Open Set Recognition: A Survey

Motivation

In real-world recognition/classification tasks, limited by various objective factors, it is usually difficult to collect training samples to exhaust all classes when training a recognizer or classifier. A more realistic scenario is open set recognition (OSR), where incomplete knowledge of the world exists at training time, and unknown classes can be submitted to an algorithm during testing, requiring the classifiers not only to accurately classify the seen classes, but also to effectively deal with the unseen ones.
现实中对于分类任务,不可能在训练集中穷尽所有类别。更实际的情况是 open set recognition (OSR). 在训练阶段包含的是不完整的 knowledge of world. 在测试阶段会出现 unknown 类别。这需要分类器不仅能准确的识别在训练阶段已经见到过的类别,也能有效的处理没有见过的类别, 比如 rejection 或者归类为 unknown.

This paper provides a comprehensive survey of existing open set recognition techniques covering various aspects ranging from related definitions, representations of models, datasets, experiment setup and evaluation metrics. Furthermore, we briefly analyze the relationships between OSR and its related tasks including zero-shot, one-shot (few-shot) recognition/learning techniques, classification with reject option, and so forth. Additionally, we also overview the open world recognition which can be seen as a natural extension of OSR. Importantly, we highlight the limitations of existing approaches and point out some promising subsequent research directions in this field.
这篇综述覆盖了相关的定义、模型、实验以及验证指标。更多地,还分析了 与OSR 相关的任务 zero-shot, one-shot 识别,以及 rejection. 额外地,还概述了 open world recognition 可以看作 OSR 的扩展。更重要的是,作者说明了当前一些方法的限制,并指出了未来研究的一些方向。

Introduction

a more realistic scenario is usually open and non-stationary such as driverless, fault/medical diagnosis, etc., where unseen situations can emerge unexpectedly, which drastically weakens the robustness of these existing methods.
更现实的场景是 开放的和非静态 的。

To meet this challenge, several related research directions actually have been explored including lifelong learning [1], [2], transfer learning [3]–[5], domain adaptation [6], zero-shot [7]–[9], one-shot (few-shot) [10]–[16] recognition/learning and open set recognition/classification [17]–[19], and so forth.
涉及到的领域: lifelong learning, transfer learning, domain adaption, zero-shot, one-shot, open set recogntion.

recognition should consider four basic categories of classes as follows:

  • known known: train/dev 中有标签的样本,包括正负类别,并且有相关的语义信息。
  • known unknown: train/dev 中有标签的样本,负类,没有相关的语义信息。
  • unknown known: test 中没有出现在 train 中的样本,但是有相关的语义信息。比如,train 中有猫,然后 test 中有另外一种猫科动物,那么动物这个样本是有意义的吧???
  • unknown unknown: test 中没有出现在 train 中的样本,并且没有任何相关的语义信息。

Unlike the traditional classification, zero-shot learning (ZSL) can identify unseen classes which have no available observations in training. However, the available semantic/attribute information shared among all classes including seen and unseen ones are needed.
zero-shot 是针对 unknown known, 也就是包含了语义信息。

The ZSL mainly focuses on the recognition of the unknown known classes defined above. Actually, such a setting is rather restrictive and impractical, since we usually know nothing about the testing samples which may come from known known classes or not.
unknown known 这种设定很有限,并且不切实际。因为我们很难知道 test 中的样本是否是包含了语义信息,无法判断是 unknown known or unknown unknown.

comparision between open set recognition and traditional classification

Via these decision boundaries, samples from an unknown unknown class are labeled as ”unknown” or rejected rather than misclassified as known known classes.

经验风险函数:

\(L(x, y, f(x)) \ge 0\) 是 loss function. P(x,y) 是对应样本 (x, y) 的概率,通常这个联合分布的概率我们是不知道的,因为我们无法确定自然界中样本空间(label space)到底是个什么分布。

[李航,机器学习] 中关于风险函数的定义:

损失函数度量一次模型预测的好坏,风险函数度量平均意义下模型预测的好坏。
以前看不懂这一部分,现在只想说: Perfect!

Therefore, traditional recognition/classification approaches minimize the empirical risk instead of the ideal risk RI by using other knowledge, such as assuming that the label space is at least locally smooth and regularizing the empirical risk minimization.
传统的分类方法是根据其他的外部知识来最小化经验风险,比如 label space 是光滑的,然后使用正则化最小经验风险(也就是上面所说的结构风险函数)。

Note that traditional recognition problem is usually performed under the closed set assumption. When the assumption switches to open environment/set scenario with the open space, other things should be added since intuitively there is some risk in labeling sample in the open space as any known known classes. This gives such an insight for OSR that we do know something else: we do know where known known classes exist, and we know that in open space we do not have a good basis for assigning labels for the unknown unknown classes.
传统的识别是假设在固定的样本空间下(known known). 当转换到开放式场景下,我们很敏感的意识到需要给 label space 加点 risk。。我们知道 known known classes 是存在的,我们也知道我们并没有这样一个 basis 去给 unknown unknown 打标签。

open space risk

这部分的内容主要引自这篇 paper: 17. Toward Open Set Recognition

这篇 paper 是把 class of interest 当作一个类,然后所有的 unknown/known 当作很多 classes, 也就是 1-vs-set.

To improve the overall open set recognition error, our 1-vs-set formulation balances the unknown classes by obtaining a core margin around the decision boundary A from the base SVM, specializing the resulting half-space by adding another plane \(\Omega\) and then generalizing or specializing the two planes (shown in Fig. 2) to optimize empirical and open space risk. This process uses the open set training data and the risk model to define a new “open set margin.” The second plane \(\Omega\) allows the 1-vs-set machine to avoid the overgeneralization that would misclassify the raccoon in Fig. 2. The overall optimization can also adjust the original margin with respect to A to reduce open space risk, which can avoid negatives such as the owl.
使用了两个超平面,去分隔 Negatives/positivecs/unknown.

While we do not know the joint distribution \(P(x, y)\) in, one way to look at the open space risk is as a weak assumption: Far from the known data the Principle of Indifference [8] suggests that if there is no known reason to assign a probability, alternatives should be given equal probability. In our case, this means that at all points in open space, all labels (both known and unknown) are equally likely, and risk should be computed accordingly. However, we cannot have constant value probabilities over infinite spaces—the distribution must be integrable and integrate to 1. We must formalize open space differently (e.g., by ensuring the problem is well posed and then assuming the probability is proportional to relative Lebesgue measure [9]). Thus, we can consider the measure of the open space to the full space, and define our risk penalty proportional to such a ratio.
无法知道联合分布 \(P(x, y)\), 作者假设所有的样本概率是相等的,但是向量空间中样本总数是不确定的,所以作者定义一个比例来描述 在 open space 中出现 unknown 的危险惩罚系数。

where open space risk is considered to be the fraction (in terms of Lebesgue measure) of positively labeled open space compared to the overall measure of positively labeled space (which includes the space near the positive examples).
open space risk 是开放空间中 positive label 的总数与总体空间中 positive label 的总体度量。

不太懂。。问题还是不知道怎么度量? unknown 的类别能确定???

openness

openness,用来表征数据集的开放程度:

  • \(C_{TR}\) 是训练集中的类别数,越大,开放程度越小。
  • \(C_{TE}\) 测试集中的类别数。

The Open Set Recognition Problem

our goal is to balance the risk of the unknown in open space with the empirical (known) risk. In this sense, we formally define the open set recognition problem as follows:
我们的目的是平衡 the risk of unknown 出现在基于 known classes 计算的到的 open space 的 empirical risk。
怎么理解呢?就是传统的风险函数都是只考虑了经验风险,也就是完全基于训练数据的。但是在 open space 里面,我们还要测试时会出现的 unknown,所以在 风险函数的设置的同时,就要考虑到 unknown 的存在。也就是前面的 open space risk.

a categorization of OSR techniques

问题的关键在于如何将 公式(4)open space risk 合并到模型中去。然后大佬们提出各式各样的模型,主要分为 discriminative model and generative models.

更进一步,可以分为:five categories (Table II):
- Traditional ML-based
- Deep Network-based
- Adversarial Learning-based
- EVT-based
- Dirichlet Process-based OSR models

Deep Neural Network-based OSR Models

大佬们的杰作,感觉都挺新的,新坑?

提出了 OpenMax,使用 deep networks, 还是用 softmax 损失函数来最小化 交叉熵 cross entropy loss. 然后在网络的倒数第二层(softmax 的前一层?)得到每一个正分类的 mean activate vector(MAV).
然后是根据 Weibull districution 去 redistribution 以及重新分类等等接下来的操作还是看相应 的 paper 吧。

the OpenMax effectively addressed the challenge of the recognition for fooling/rubbish and unrelated open set images. However, as discussed in [71], the OpenMax fails to recognize the adversarial images which are visually indistinguishable from training samples but are designed to make deep networks produce high confidence but incorrect answers [96], [98].
OpenMax 有效的解决了 不相关的 open set images 的问题,但是却无法有效区分对抗生成样本。

Actually, the authors in [72] have indicated that the OpenMax is susceptible to the adversarial generation techniques directly working on deep representations. Therefore, the adversarial samples are still a serious challenge for open set recognition. Furthermore, using the distance from MAV, the cross entropy loss function in OpenMax does not directly incentivize projecting class samples around the MAV. In addition to that, the distance function used in testing is not used in training, possibly resulting in inaccurate measurement in that space [73]. To address this limitation, Hassen and Chan [73] learned a neural network based representation for open set recognition, which is similar in spirit to the Fisher Discriminant, where samples from the same class are closed to each other while the ones from different classes are further apart, leading to larger space among known known classes for unknown unknown classes’ samples to occupy.
交叉熵并不能有效的将类别映射到相应的 MAV 中,因为在测试集中的 distence function 跟在 training set 里面是不一样的,这会导致不准确的判别。基于此,[73]提出了 Fisher 判别,从同一个类别中采样,使得unknown unknown 和 known known 的间距很大。

  • OpenMax to text classification
  • Deep Open classifier
  • tWiSARD
  • hidden unknown unknown classes

Adversarial Learning-based OSR Models

Note that the main challenge for open set recognition is the incomplete class knowledge existing in training, leading to the open space risk when classifiers encounter unknown unknown classes during testing. Fortunately, the adversarial learning technique can account for open space to some extent by adversarially generating the unknown unknown class data according to the known known class knowledge, which undoubtedly provides another way to tackle the challenging multiclass OSR problem.
open set recognition 最大的挑战是 training 中不完整的 knowledge, 在 testing 中遇到 unknown unknown 导致 open space risk.
而对抗训练网络在某种程度上根据 known known 生成 unknown unknown,提供了另外一种方式解决 OSR 问题。

EVT-based OSR Models

As a powerful tool to increase the classification performance, the statistical Extreme Value Theory (EVT) has recently achieved great success due to the fact that EVT can effectively model the tails of the distribution of distances between training observations using the asymptotic theory[100].

不是很懂这个理论,给出几篇 paper 吧

Remark: As mentioned above, almost all existing OSR methods adopt the threshold-based classification scheme, where recognizers in decision either reject or categorize the input samples to some known known class using empirically set threshold. Thus the threshold plays a key role. However, the selection for it usually depends on the knowledge of known known classes, inevitably incurring risks due to lacking available information from unknown unknown classes [57]. This indicates the threshold-based OSR methods still face serious challenges.
基于 known known 得到的 threshold 因为缺乏 unknown unknown 的信息,不可避免的会造成 risk, 这也是基于 threshold 这类方法所面临的困难。

Dirichlet Process-based OSR Models (生成模型)

Dirichlet process (DP) [104]–[108] considered as a distribution over distributions is a stochastic process, which has been widely applied in clustering and density estimation problems as a nonparametric prior defined over the number of mixture components. Furthermore, this model does not overly depend on training samples and can achieve adaptive change as the data changes, making it naturally adapt to the open set recognition scenario. In fact, researchers have begun the related research
Dirichlet 过程作为一种基于混合模型的非参数方法广泛用于聚类,参数估计。这种模型不需要依赖于 training,可以随着 dataset 的变化而自适应的变化,这使得它能有效的适用于 open set 的场景。

对生成模型不是很熟。。

Remark: Instead of addressing the OSR problem from the discriminative model perspective, CD-OSR actually reconsiders this problem from the generative model perspective due to the use of HDP, which provides another research direction for open set recognition. Furthermore, the collective decision strategy for OSR is also worth further exploring since it not only takes the correlations among the testing samples into account but also provides a possibility for new class discovery, whereas single-sample decision strategy2 adopted by other existing OSR methods can not do such a work since it can not directly tell whether the single rejected sample is an outlier or from new class.

Beyond open set Recognition

关于 open set recognition 如果仅仅考虑静态的 set,意义不是很大。以及,只对 unknown unknown 进行 rejection 也是不够的。为此,有人提出 open world recognition.

open world recognition (OWR), where a recognition system should perform four tasks:
- detecting unknown unknown classes
- choosing which samples to label for addition to the model
- labelling those samples
- updating the classifier

Remark: As a natural extension of OSR, the OWR faces more serious challenges which require it not only to have the ability to handle the OSR task, but also to have minimal downtime, even to continuously learn, which seems to have the flavor of lifelong learning to some extent. Besides, although some progress regarding the OWR has been made, there is still a long way to go.
终身学习。。666

Dataset and evalution metrics

dataset

  • https://dx.doi.org/10.6084/m9.figshare.1097614
  • https://www.csie.ntu.edu.tw/∼cjlin/libsvmtools/datasets/multi-class.html

Experiment Setup: In open set recognition, most existing experiments are carried out on a variety of recastes multi-class benchmark datasets. Specifically, taking the Usps dataset as an example, when it is used for OSR problem, one can randomly choose S distinct labels as the known known classes, and vary openness by adding a subset of the remaining labels.
可以增加减少类别数来改变 openness.

Evaluation Metrics for Open Set Recognition

  • TP: true positive
  • FP: false positive
  • TN: true negative
  • FN: false negative
  • TU: true unknown
  • FU: false unknown

accuracy

对于 closed set :

\[\text{accuracy}=\dfrac{TP+TN}{TP+TN+FP+FN}\]

对于 open set:

\[\text{accuracy}_O=\dfrac{(TP+TN)+TU}{(TP+TN+FP+FN)+(TU+FU)}\]

对于不均衡情况,accuracy 并不能客观的评价模型好坏。比如在testing 中,unknown unknown 样本数量很多,那么如果分类器把所有的类别都判为 unknown unknown,它的准确率依旧很高。

于是,有人提出了 normalized accuracy(NA).

\(0\le \lambda \le 1\) 是正则化常数。

F-measure

F1:
\[F1=\dfrac{2*\text{precision}* \text{recall}}{\text{precision}+\text{recall}}\]

\[precision=\dfrac{TP}{TP+FP}\] 精度: 预测得到的 positive 中真正是 positive 的概率。

\[recall=\dfrac{TP}{TP+FN}\] 召回: 所有真正 positive 的样本被预测为 positive 的概率。

在 open set 场景下,F1 值无法考虑 unknown unknown.

Instead, the computations of Precision and Recall in it are only for available known known classes. Additionally, the work [67] has indicated that although the computations of Precision and Recall are only for available known known classes, the FN and FP also consider the false unknown unknown classes and false known known classes by taking into account the false negative and the false positive, and we refer the reader to [67] for more details.
事实上,在 FP 和 FN 中可能也包括 false unknown unknown, 这就有问题了是吧。。 详细参考这篇 paper Nearest neighbors distance ratio open-set classifier

Note that the Precision contains the macro-Precision and micro-Precision while Recall includes the macro-Recall and micro-Recall, which leads to the corresponding macro-F-measure and micro-F-measure. Nevertheless, whether it is macro-F-measure or micro-F-measure, the higher their values, the better the performance of the corresponding OSR model.

Youden’s index for OSR

\[J= \text{Recall}+S-1\] 其中 S 是真负类率: \(S=\dfrac{TN}{TN+FP}\)

future research directions

About modeling

  • 大部分工作都是基于判别模型来做的,只有少部分是基于生成模型,也许生成模型会更有探索空间。
  • OSR 的主要挑战是传统的分类器是在 closed-set 场景下获得的,一旦 unknown unknown class 落入这个空间,将永远无法被正确的分类。

modeling known known classes

如果得到的 known known class 没有被过拟合,那么这样的分类器就能有效的区分出 unknown unknown. 所以聚类和分类算法的结合会是不错的方向。关于 clustering 和 classification 的 unified learning framework:

这两篇 paper 依旧是在 closed-set 下做的,所以需要你去尝试。。。

modeling unknown unknown classes

似乎在只有 known known classes 的情况下是很难去学习 unknown unknown 的类的性质的。但是可以通过对抗学习来生成 unknown unknown 也是不错的方向。

顺便作者还提了下 transductive leanring,以及基于 Dirichlet process 的自适应行,CD-OSR、Dirichlet processed-based OSR 也是值得探索的。

About rejecting

大部分的工作都是 reject unknown unknown classes,而没有后续的工作了。只有少量的 [66][67]进行了后续的工作,比如 new classes discovery.

About the decision

所有的 OSR 模型都是用来识别单个样本的,但是一个决策的决定并没有考虑样本之间的相关性。所以 collective decision 不仅在 testing 时考虑相关性,同时还能发现 new classes.

Open set + 'sth'

As open set scenario is a more practical assumption for the real-world classification/recognition tasks, it can naturally be combined with various fields involving classification/recognition such as semi-supervised learning, domain adaptation, active learning, multi-task learning, multi-label learning, multi-view learning, and so forth. For example, [124]–[126] recently introduced this scenario into domain adaptation, while [127] explored the open set classification in active learning field. Therefore, many interesting works are worth looking forward to.
看起来是个不错的方向。。

Generalized Open Set Recognition

利用 side-information,比如 unknown unknwon 和 known known 会有共同的语义信息(semantic/attribute information). #### Appending semantic/attribute information > In fact, a lot of semantic/attibute information is shared between the known known and the unknown unknown classes. Therefore, we can fully utilize this kind of information to ’cognize’ the unknown unknown classes, or at least to provide a rough semantic/attribute description for the corresponding unknown unknown classes instead of simply rejecting them.
利用语义信息去意识到 unknown unknwon,而不是简单的 reject.

但是要注意区分 open set recognition 和 ZSL(zero-shot learning) 的区别:

The \(\text{side-information}^1\) in ZSL denotes the semantic/attribute information shared among all classes including known known and unknown known classes.

where the \(\text{side-information}^4\) denotes the available semantic/attribute information only for known known classes

感觉这个 side-information 的界限很难确定啊?Generalized Open Set Recognition 的这个范围似乎很难实现, 怎么可能出现在 training 中的 semantice information 完全不出现在 unknown unknown 中呢。。

还有一些相似的工作:

Using other available side-information

The main reason for open space risk is that the traditional classifiers trained under closed set scenario usually divide over-occupied space for known known classes, thus inevitably resulting in misclassifications once the unknown unknown class samples fall into the space divided for some known known class. From this perspective, the open space risk will be reduced as the space divided for those known known classes decreases by using other side-information like universum [135], [136] to shrink their regions as much as possible.
虽然感觉很扯淡。。但是还是有人做啊,不过关于 open space risk 的定义可以在看一遍。。

Relative Open Set Recognition

感觉这个还挺有意思的。疾病的诊断,所有的样本空间都可以区分为 sick or no sick, 所以仅仅是判断有没有病,那么这是个 closed set 问题。但是如果我们要进一步判断疾病的类型,那么有可能出现 unseen disease in training.

Knowledge Integration for Open Set Recognition

In fact, the incomplete knowledge of the world is universal, especially for the single individuals: something you know does not mean I also know.

how to integrate the classifiers trained on each sub-knowledge set to further reduce the open space risk will be an interesting yet challenging topic in the future work, especially for such a situation: we can only obtain the classifiers trained on corresponding sub-knowledge sets, yet these sub-knowledge sets are not available due to the privacy protection of data.
利用知识库来减小 open space risk。

似乎这个看起来比较靠谱,因为 unknown 范围确实很难定义,如果给个外部知识库给你,把跟知识库相关的 unknown 识别出来,就很棒了吧

相关的一些开源工具和代码: