# 论文笔记-baseline for OOD

paper: A baseline for detecting misclassified and out-of-distribution examples

### Motivation

.

• baseline mathod 是什么？也就是怎么去评价一个模型自动区分出 OOD 的能力。

• 作者给出的标准任务和数据

### Baseline method

• error and success prediction: 能否正确的对一个样本分类

• in- and out-of-distribution detection： 能否正确的检测出 OOD

#### metric1: AUROC

ROC 曲线是依赖于阈值的性能验证指标 (metric which is a threshold-independent performance evalution). 因为在这类不均衡问题中，我们关注的是 positive label. 所以我们关注的指标是 真正类率 TPR, 负正类率 FPR.

• TPR, 真正类率(true positive rate ,TPR),： 如果一个实例是正类并且也被 预测成正类，即为真正类（True positive），真正类率是指分类器所识别出来的 正实例占所有正实例的比例。就是正类的 Recall 吧～ TPR = TP / (TP + FN)

• FPR, 负正类率： 分类器错认为正类的负实例占所有负实例的比例，FPR = FP / (FP + TN)

#### metric2: AUPR

Area Under the Precision-Recall curve (AUPR)

The PR curve plots the precision (tp=(tp+fp)) and recall

(tp=(tp + fn)) against each other. 对于 PR 曲线，选择哪个类别作为 positive 类，非常重要。

# rejection系列3 OpenMax

paper: Towards Open Set Deep Networks. CVPR

## Motivation

closed set recognition 天然的特性使得它必须选择一个类别作为预测对象。但是实际场景下， recognition system 必须学会 reject unknown/unseen classes 在 testing 阶段。

A key element of estimating the unknown probability is adapting Meta-Recognition concepts to the activation patterns in the penultimate layer of the network.

## Introduction

probability/confidence scores. 比如通过对抗学习得到的 adversarial images. 作者在后面也提到了， threshold 实际上拒绝的不是 unknown, 而是 uncertain predictions.

OpenMax incorporates likelihood of the recognition system failure. This likelihood is used to estimate the probability for a given input belonging to an unknown class. For this estimation, we adapt the concept of Meta-Recognition[22, 32, 9] to deep networks. We use the scores from the penultimate layer of deep networks (the fully connected layer before SoftMax, e.g., FC8) to estimate if the input is “far” from known training data. We call scores in that layer the activation vector(AV).

A key insight in our opening deep networks is noting that “open space risk” should be measured in feature space rather than in pixel space.

We show that an extreme-value meta-recognition inspired distance normalization process on the overall activation patterns of the penultimate network layer provides a rejection probability for OpenMax normalization for unknown images, fooling images and even for many adversarial images.

## Open set deep networks

Building on the concepts of open space risk, we seek to choose a layer (feature space) in which we can build a compact abating probability model that can be thresholded to limit open space risk.

### multi-classes meta-recognition

. Prior work on meta-recognition used the final system scores, analyzed their distribution based on Extreme Value Theory (EVT) and found these distributions follow Weibull distribution.

from wikipedia:

It seeks to assess, from a given ordered sample of a given random variable, the probability of events that are more extreme than any previously observed.

We take the approach that the network values from penultimate layer (hereafter the Activation Vector (AV)), are not an independent per-class score estimate, but rather they provide a distribution of what classes are “related.”

# rejection系列1-overview

## Motivation

In real-world recognition/classification tasks, limited by various objective factors, it is usually difficult to collect training samples to exhaust all classes when training a recognizer or classifier. A more realistic scenario is open set recognition (OSR), where incomplete knowledge of the world exists at training time, and unknown classes can be submitted to an algorithm during testing, requiring the classifiers not only to accurately classify the seen classes, but also to effectively deal with the unseen ones.

This paper provides a comprehensive survey of existing open set recognition techniques covering various aspects ranging from related definitions, representations of models, datasets, experiment setup and evaluation metrics. Furthermore, we briefly analyze the relationships between OSR and its related tasks including zero-shot, one-shot (few-shot) recognition/learning techniques, classification with reject option, and so forth. Additionally, we also overview the open world recognition which can be seen as a natural extension of OSR. Importantly, we highlight the limitations of existing approaches and point out some promising subsequent research directions in this field.

## Introduction

a more realistic scenario is usually open and non-stationary such as driverless, fault/medical diagnosis, etc., where unseen situations can emerge unexpectedly, which drastically weakens the robustness of these existing methods.

To meet this challenge, several related research directions actually have been explored including lifelong learning [1], [2], transfer learning [3]–[5], domain adaptation [6], zero-shot [7]–[9], one-shot (few-shot) [10]–[16] recognition/learning and open set recognition/classification [17]–[19], and so forth.

recognition should consider four basic categories of classes as follows:

• known known: train/dev 中有标签的样本，包括正负类别，并且有相关的语义信息。

• known unknown: train/dev 中有标签的样本，负类，没有相关的语义信息。

• unknown known: test 中没有出现在 train 中的样本，但是有相关的语义信息。比如，train 中有猫，然后 test 中有另外一种猫科动物，那么动物这个样本是有意义的吧？？？

• unknown unknown: test 中没有出现在 train 中的样本，并且没有任何相关的语义信息。

Unlike the traditional classification, zero-shot learning (ZSL) can identify unseen classes which have no available observations in training. However, the available semantic/attribute information shared among all classes including seen and unseen ones are needed.

zero-shot 是针对 unknown known, 也就是包含了语义信息。

The ZSL mainly focuses on the recognition of the unknown known classes defined above. Actually, such a setting is rather restrictive and impractical, since we usually know nothing about the testing samples which may come from known known classes or not.

unknown known 这种设定很有限，并且不切实际。因为我们很难知道 test 中的样本是否是包含了语义信息，无法判断是 unknown known or unknown unknown.

comparision between open set recognition and traditional classification

Via these decision boundaries, samples from an unknown unknown class are labeled as ”unknown” or rejected rather than misclassified as known known classes.

$L(x, y, f(x)) \ge 0$ 是 loss function. P(x,y) 是对应样本 (x, y) 的概率，通常这个联合分布的概率我们是不知道的，因为我们无法确定自然界中样本空间(label space)到底是个什么分布。

[李航，机器学习] 中关于风险函数的定义：

Therefore, traditional recognition/classification approaches minimize the empirical risk instead of the ideal risk RI by using other knowledge, such as assuming that the label space is at least locally smooth and regularizing the empirical risk minimization.

Note that traditional recognition problem is usually performed under the closed set assumption. When the assumption switches to open environment/set scenario with the open space, other things should be added since intuitively there is some risk in labeling sample in the open space as any known known classes. This gives such an insight for OSR that we do know something else: we do know where known known classes exist, and we know that in open space we do not have a good basis for assigning labels for the unknown unknown classes.

### open space risk

To improve the overall open set recognition error, our 1-vs-set formulation balances the unknown classes by obtaining a core margin around the decision boundary A from the base SVM, specializing the resulting half-space by adding another plane $\Omega$ and then generalizing or specializing the two planes (shown in Fig. 2) to optimize empirical and open space risk. This process uses the open set training data and the risk model to define a new “open set margin.” The second plane $\Omega$ allows the 1-vs-set machine to avoid the overgeneralization that would misclassify the raccoon in Fig. 2. The overall optimization can also adjust the original margin with respect to A to reduce open space risk, which can avoid negatives such as the owl.

While we do not know the joint distribution $P(x, y)$ in, one way to look at the open space risk is as a weak assumption: Far from the known data the Principle of Indifference [8] suggests that if there is no known reason to assign a probability, alternatives should be given equal probability. In our case, this means that at all points in open space, all labels (both known and unknown) are equally

likely, and risk should be computed accordingly. However, we cannot have constant value probabilities over infinite spaces—the distribution must be integrable and integrate to 1. We must formalize open space differently (e.g., by ensuring the problem is well posed and then assuming the probability is proportional to relative Lebesgue measure [9]). Thus, we can consider the measure of the open space to the full space, and define our risk penalty proportional to such a ratio.

where open space risk is considered to be the fraction (in terms of Lebesgue measure) of positively labeled open space compared to the overall measure of positively labeled space (which includes the space near the positive examples).

open space risk 是开放空间中 positive label 的总数与总体空间中 positive label 的总体度量。

### openness

openness，用来表征数据集的开放程度：

• $C_{TR}$ 是训练集中的类别数，越大，开放程度越小。

• $C_{TE}$ 测试集中的类别数。

### The Open Set Recognition Problem

our goal is to balance the risk of the unknown in open space with the empirical (known) risk. In this sense, we formally define the open set recognition problem as follows:

## a categorization of OSR techniques

• Deep Network-based

• EVT-based

• Dirichlet Process-based OSR models

### Deep Neural Network-based OSR Models

the OpenMax effectively addressed the challenge of the recognition for fooling/rubbish and unrelated open set images. However, as discussed in [71], the OpenMax fails to recognize the adversarial images which are visually indistinguishable from training samples but are designed to make deep networks produce high confidence but incorrect answers [96], [98].

OpenMax 有效的解决了 不相关的 open set images 的问题，但是却无法有效区分对抗生成样本。

Actually, the authors in [72] have indicated that the OpenMax is susceptible to the adversarial generation techniques directly working on deep representations. Therefore, the adversarial samples are still a serious challenge for open set recognition. Furthermore, using the distance from MAV, the cross entropy loss function in OpenMax does not directly incentivize projecting class samples around the MAV. In addition to that, the distance function used in testing is not used in training, possibly resulting in inaccurate measurement in that space [73]. To address this limitation, Hassen and Chan [73] learned a neural network based representation for open set recognition, which is similar in spirit to the Fisher Discriminant, where samples from the same class are closed to each other while the ones from different classes are further apart, leading to larger space among known known classes for unknown unknown classes’ samples to occupy.

• OpenMax to text classification

• Deep Open classifier

• tWiSARD

• hidden unknown unknown classes

Note that the main challenge for open set recognition is the incomplete class knowledge existing in training, leading to the open space risk when classifiers encounter unknown unknown classes during testing. Fortunately, the adversarial learning technique can account for open space to some extent by adversarially generating the unknown unknown class data according to the known known class knowledge, which undoubtedly provides another way to tackle the challenging multiclass OSR problem.

open set recognition 最大的挑战是 training 中不完整的 knowledge， 在 testing 中遇到 unknown unknown 导致 open space risk.

### EVT-based OSR Models

As a powerful tool to increase the classification performance, the statistical Extreme Value Theory (EVT) has recently achieved great success due to the fact that EVT can effectively model the tails of the distribution of distances between training observations using the asymptotic theory[100].

Remark: As mentioned above, almost all existing OSR methods adopt the threshold-based classification scheme, where recognizers in decision either reject or categorize the input samples to some known known class using empirically set threshold. Thus the threshold plays a key role. However, the selection for it usually depends on the knowledge of known known classes, inevitably incurring risks due to lacking available information from unknown unknown classes [57]. This indicates the threshold-based OSR methods still face serious challenges.

### Dirichlet Process-based OSR Models (生成模型)

Dirichlet process (DP) [104]–[108] considered as a distribution over distributions is a stochastic process, which has been widely applied in clustering and density estimation problems as a nonparametric prior defined over the number of mixture components. Furthermore, this model does not overly depend on training samples and can achieve adaptive change as the data changes, making it naturally adapt to the open set recognition scenario. In fact, researchers have begun the related research

Dirichlet 过程作为一种基于混合模型的非参数方法广泛用于聚类，参数估计。这种模型不需要依赖于 training，可以随着 dataset 的变化而自适应的变化，这使得它能有效的适用于 open set 的场景。

Remark: Instead of addressing the OSR problem from the discriminative model perspective, CD-OSR actually reconsiders this problem from the generative model perspective due to the use of HDP, which provides another research direction for open set recognition. Furthermore, the collective decision strategy for OSR is also worth further exploring since it not only takes the correlations among the testing samples into account but also provides a possibility for new class discovery, whereas single-sample decision strategy2 adopted by other existing OSR methods can not do such a work since it can not directly tell whether the single rejected sample is an outlier or from new class.

## Beyond open set Recognition

open world recognition (OWR), where a recognition system should perform four tasks:

• detecting unknown unknown classes

• choosing which samples to label for addition to the model

• labelling those samples

• updating the classifier

Remark: As a natural extension of OSR, the OWR faces more serious challenges which require it not only to have the ability to handle the OSR task, but also to have minimal

downtime, even to continuously learn, which seems to have the flavor of lifelong learning to some extent. Besides, although some progress regarding the OWR has been made, there is still a long way to go.

## Dataset and evalution metrics

### dataset

Experiment Setup: In open set recognition, most existing experiments are carried out on a variety of recastes multi-class benchmark datasets. Specifically, taking the Usps dataset as an example, when it is used for OSR problem, one can randomly choose S distinct labels as the known known classes, and vary openness by adding a subset of the remaining labels.

### Evaluation Metrics for Open Set Recognition

• TP： true positive

• FP: false positive

• TN: true negative

• FN: false negative

• TU: true unknown

• FU: false unknown

#### accuracy

$$\text{accuracy}=\dfrac{TP+TN}{TP+TN+FP+FN}$$

$$\text{accuracy}_O=\dfrac{(TP+TN)+TU}{(TP+TN+FP+FN)+(TU+FU)}$$

$0\le \lambda \le 1$ 是正则化常数。

#### F-measure

F1:

$$F1=\dfrac{2\text{precision} \text{recall}}{\text{precision}+\text{recall}}$$

$$precision=\dfrac{TP}{TP+FP}$$

$$recall=\dfrac{TP}{TP+FN}$$

Instead, the computations of Precision and Recall in it are only for available known known classes. Additionally, the work [67] has indicated that although the computations of Precision and Recall are only for available known known classes, the FN and FP also consider the false unknown unknown classes and false known known classes by taking into account the false negative and the false positive, and we refer the reader to [67] for more details.

Note that the Precision

contains the macro-Precision and micro-Precision while Recall includes the macro-Recall and micro-Recall, which leads to the corresponding macro-F-measure and micro-F-measure. Nevertheless, whether it is macro-F-measure or micro-F-measure, the higher their values, the better the performance of the corresponding OSR model.

#### Youden’s index for OSR

$$J= \text{Recall}+S-1$$

## future research directions

• 大部分工作都是基于判别模型来做的，只有少部分是基于生成模型，也许生成模型会更有探索空间。

• OSR 的主要挑战是传统的分类器是在 closed-set 场景下获得的，一旦 unknown unknown class 落入这个空间，将永远无法被正确的分类。

### Open set + ‘sth’

As open set scenario is a more practical assumption for the real-world classification/recognition tasks, it can naturally be combined with various fields involving classification/recognition such as semi-supervised learning, domain adaptation, active learning, multi-task learning, multi-label learning, multi-view learning, and so forth. For example, [124]–[126] recently introduced this scenario into domain adaptation, while [127] explored the open set classification in active learning field. Therefore, many interesting works are worth looking forward to.

### Generalized Open Set Recognition

#### Appending semantic/attribute information

In fact, a lot

of semantic/attibute information is shared between the known known and the unknown unknown classes. Therefore, we can fully utilize this kind of information to ’cognize’ the unknown unknown classes, or at least to provide a rough semantic/attribute description for the corresponding unknown unknown classes instead of simply rejecting them.

The $\text{side-information}^1$ in ZSL denotes the semantic/attribute information shared among all classes including known known and unknown known classes.

where the $\text{side-information}^4$ denotes the available semantic/attribute information only for known known classes

#### Using other available side-information

**The main reason for open space risk is that the traditional classifiers trained under closed set scenario usually divide over-occupied space for known known classes, thus inevitably resulting in misclassifications once the unknown unknown class samples

fall into the space divided for some known known class.** From this perspective, the open space risk will be reduced as the space divided for those known known classes decreases by

using other side-information like universum [135], [136] to shrink their regions as much as possible.

### Knowledge Integration for Open Set Recognition

In fact, the incomplete knowledge of the world is universal, especially for the single individuals: something you know does not mean I also know.

how to integrate the classifiers trained on each sub-knowledge set to further reduce the open space risk will be an interesting yet challenging topic in the future work, especially for such a situation: we can only obtain the classifiers trained on corresponding sub-knowledge sets, yet these sub-knowledge sets are not available due to the privacy protection of data.