Motivation

IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models

Model Architecture

A Minimax Retrieval Framework

a set of queries $\{q_1,...,q_N\}$, a set of documents $\{d_1,...,d_M\}$. 其中 给定一个 query 都有对应的相关度较高的 document 也就是真实的数据 $true_{(q,d)}$，其数据量是远小于总的document数量 M 的.

The underlying true relevance distribution can be expressed as conditional probability $p_{true} (d|q, r)$, which depicts the (user’s) relevance preference distribution over the candidate documents with respect to her submitted query.

Generative retrieval model $p_{\theta}(d|q,r)$: 生成器的目的就是去尽可能的模拟真实的相关性分布 $p_{ture}(d|q,r)$, 从而尽可能生成相似度高的 document.

Discriminative retrieval model $f_{\phi}(q,d)$：是一个二分类分类器。

Overall Objective 目标函数：

pairwise case

Furthermore, ifwe use graded relevance scales (indicating a varying degree of match between each document and the corresponding query) rather than binary relevance, the training data could also be represented naturally as ordered document pairs.