Are Graph Augmentations Necessary? Simple Graph Contrastive Learning for Recommendation

2022-07-02

字数统计: 2k | 阅读时长≈ 9 分钟

Are Graph Augmentations Necessary? Simple Graph Contrastive Learning for Recommendation ，2022，SIGIR

1. 简介

1.1 摘要

Contrastive learning (CL) recently has spurred a fruitful line of research in the field of recommendation, since its ability to extract self-supervised signals from the raw data is well-aligned with recommender systems’ needs for tackling the data sparsity issue. A typical pipeline of CL-based recommendation models is first augmenting the user-item bipartite graph with structure perturbations, and then maximizing the node representation consistency between different graph augmentations. Although this paradigm turns out to be effective, what underlies the performance gains is still a mystery. In this paper, we first experimentally disclose that, in CL-based recommendation models, CL operates by learning more evenly distributed user/item representations that can implicitly mitigate the popularity bias. Meanwhile, we reveal that the graph augmentations, which were considered necessary, just play a trivial role. Based on this finding, we propose a simple CL method which discards the graph augmentations and instead adds uniform noises to the embedding space for creating contrastive views. A comprehensive experimental study on three benchmark datasets demonstrates that, though it appears strikingly simple, the proposed method can smoothly adjust the uniformity of learned representations and has distinct advantages over its graph augmentation-based counterparts in terms of recommendation accuracy and training efficiency.

最近对比学习在推荐领域的研究中取得了丰硕的成果，因为它能从原始数据中提取自监督型号的能力与推荐系统解决数据系数问题的需求可以很好地吻合。一个典型的CL-based推荐模型框架，首先通过扰动对原始图进行增强，然后最大化两个增强视角的节点表示之间的一致性。尽管这种模式被证明是有效的，但其为什么能带来性能增益依旧是个谜。本文，我们首先通过实验揭示，在基于CL的推荐模型中，CL通过学习更均匀的user/item分布，这可以implicitly mitigate the popularity bias。并且我们发现，曾经认为很重要的图增强其实只扮演了一个trivial role。基于这些发现，我们提出了一种简单的CL方法，放弃图增强，通过想嵌入空间添加uniform noises来创建对比视角。标准数据集上的大量实验表明，我们的方法虽然简单，但是可以smoothly adjust the uniformity of learned representations，并且和带有增强的CL方法相比，可以提高推荐准确性和训练效率。

1.2 本文工作

背景： CL最近被广泛用于深度表示学习，也越来越多的研究，利用CL来提高推荐模型的准确性。

动机： （1）虽然利用CL可以带来非常不错的结果，但是what underlies the performance gains still remains unclear。（2）最近的一些工作表明， extremely sparse graph augmentations (with edge dropout rate 0.9) in CL can bring desired performance gains。这让我们很自然的想到一个问题： Do we really need graph augmentations when integrating CL with recommendation?

本文工作： （1）通过一系列实验，回答了上面两个问题；（2）设计了一种无须增强的CL模型用于推荐系统，具体来说，通过向嵌入空间中加入统一噪声，创建对比视角。

2. 方法

2.1 先导知识

2.1.1 推荐系统中的CL

SGL是目前CL-based 推荐模型中的SOTA方法（这篇文章的对比对象），参照InfoNCE，使用节点和边的dropout对原始图进行增强。SGL中采用联合训练模式，其定义如下：

\mathcal{L}_{\text {joint }}=\mathcal{L}_{\text {rec }}+\lambda \mathcal{L}_{c l},

其中包含两个部分：推荐损失和对比损失。对比损失定义如下：

\mathcal{L}_{c l}=\sum_{i \in \mathcal{B}}-\log \frac{\exp \left(\mathrm{z}_{i}^{\prime \top} \mathrm{z}_{i}^{\prime \prime} / \tau\right)}{\sum_{j \in \mathcal{B}} \exp \left(\mathrm{z}_{i}^{\prime \top} \mathrm{z}_{j}^{\prime \prime} / \tau\right)}

为了学习节点表示，SGL采用LightGCN作为backbone，其消息传播过程定义如下：

\mathrm{E}=\frac{1}{1+L}\left(\mathrm{E}^{(0)}+\tilde{\mathrm{A}} \mathrm{E}^{(0)}+\ldots+\tilde{\mathrm{A}}^{L} \mathrm{E}^{(0)}\right)

2.1.2 CL中增强的必要性

为了揭示增强在CL中是否必要，作者设计了一些SGL的变体：

SGL-WA表示不使用增强的SGL，其对比损失定义如下：
$\mathcal{L}_{c l}=\sum_{i \in \mathcal{B}}-\log \frac{\exp (1 / \tau)}{\sum_{j \in \mathcal{B}} \exp \left(\mathrm{z}_{i}^{\top} \mathrm{z}_{j} / \tau\right)}$
SGL-ND表示使用node drop
SGL-ED表示使用edge drop
SGL-RW表示使用random walk
CL only表示仅仅使用CL损失优化模型

实验结果如下表所示：

2.1.3 InfoNCE损失的作用

有工作表明，图像中的对比损失由两方面作用： alignment of features from positive pairs, and uniformity of the normalized feature distribution on the unit hypersphere.

对于推荐系统中的CL是否具有同样作用，还不清楚。

由于推荐任务是one-class problem，所以作者这里只探讨CL的uniformity作用。

下图可视化了模型学习到的表示向量分布：

可以看到，使用CL后，学习到的表示要更加平滑。作者认为： Optimizing the CL loss can be seen
as an implicit way to debias (discussed in section 4.2) because a more even representation distribution can preserve the intrinsic characteristics of nodes and improve the generalization ability. This can be a persuasive explanation for the unexpected performance of SGL-WA.

“CL only”虽然很平滑，但是性能并不好。说明，我们要追求一个平衡，不能过度追究分布的平滑，而忽视了相关 pairs之间的closeness。

2.2 SimGCL

基于前面的发现，作者认为，通过调整the uniformity of the learned representation in a certain scope，可以得到最优表现。

鉴于直接操作图结构非常棘手和耗时，作者将注意力放在嵌入空间上。受对抗训练的启发，作者直接add random noises to the representation。

2.1.1 CL中增强的必要性

具体来说，对于节点 $i$ 及其表示 $e_i$ ，实施以下representation-level增强：

\mathrm{e}_{i}^{\prime}=\mathrm{e}_{i}+\Delta_{i}^{\prime}, \quad \mathrm{e}_{i}^{\prime \prime}=\mathrm{e}_{i}+\Delta_{i}^{\prime \prime}

其中 $||\Delta||_2=\epsilon$ ， $\Delta=\bar{\Delta} \odot \operatorname{sign}\left(\mathrm{e}_{i}\right), \bar{\Delta} \in \mathbb{R}^{d} \sim U(0,1)$ 。下图展示了增强前后后的节点表示