SimGRACE: A Simple Framework for Graph Contrastive Learning without Data Augmentation

2022-06-10

字数统计: 2.7k | 阅读时长≈ 12 分钟

SimGRACE: A Simple Framework for Graph Contrastive Learning without Data Augmentation，2022，WWW

1. 简介

1.1 摘要

Graph contrastive learning (GCL) has emerged as a dominant technique for graph representation learning which maximizes the mutual information between paired graph augmentations that share the same semantics. Unfortunately, it is difficult to preserve semantics well during augmentations in view of the diverse nature of graph data. Currently, data augmentations in GCL that are designed to preserve semantics broadly fall into three unsatisfactory ways. First, the augmentations can be manually picked per dataset by trial-and-errors. Second, the augmentations can be selected via cumber some search. Third, the augmentations can be obtained by introducing expensive domain-specific knowledge as guidance. All of these limit the efficiency and more general applicability of existing GCL methods. To circumvent these crucial issues, we propose a Simple framework for GRAph Contrastive lEarning, SimGRACE for brevity, which does not require data augmentations. Specifically, we take original graph as input and GNN model with its perturbed version as two encoders to obtain two correlated views for contrast. SimGRACE is inspired by the observation that graph data can preserve their semantics well during encoder perturbations while not requiring manual trial-and-errors, cumbersome search or expensive domain knowledge for augmentations selection. Also, we explain why SimGRACE can succeed. Furthermore, we devise adversarial training scheme, dubbed AT-SimGRACE, to enhance the robustness of graph contrastive learning and theoretically explain the reasons. Albeit simple, we show that SimGRACE can yield competitive or better performance compared with state-of-the-art methods in terms of generalizability, transferability and robustness, while enjoying unprecedented degree of flexibility and efficiency.

图对比学习最近成了图表示学习的主流方法之一，通过最大化两个共享相同语义信息的增强视角之间的互信息来学习节点嵌入。不幸的是，考虑到图数据存在天然的差异性，因此在图增强过程中很难保留语义信息。目前，GCL中的数据增强策略的选取方式主要有3种类型：第1种是通过人工进行试错式选取。第2种，通过繁琐的搜索来寻找最适合的增强策略。第3种是通过引入大量领域相关知识作为指导来选取。这些方法都限制了现有GCL方法的有效性和通用性。为了解决这个问题，我们提出了SimGRACE，一个不需要数据增强的GCL框架。具体来说，将原始图作为输入，GNN模型的2个perturbed版本作为两个编码器，用来获得两个视角进行对比。SIMGRACE的ideal来自于： the observation that graph data can preserve their semantics well during encoder perturbations，从而避免了前面提到的3种缺陷。当然，我们解释了为什么SIMGRACE能取得成功。另外，我们设计了对抗训练模式AT-SIMGRACE，来提高图对比学习的鲁棒性，并从理论上进行了解释。尽管方法简单，和SOTA方法相比，我们发现SimGRACE无论是性能、鲁棒性还是迁移能力都具有竞争力或更好表现，同时效率和灵活性都更高。

1.2 本文工作

背景： 现有GCL方法对于增强策略的选取方式主要有3种：人工选择、grid search或者引入领域相关信息作为指导。这些方式限制了GCL方法的性能和通用性。

注：这里作者说的不准，很多自适应增强或者自动增强方法，没有提到。（可能可以糊弄下大同行）

动机： 图增强的核心在于：在保留原始图语义信息的前提下，生成两个具有一定差异的增强视角。作者的ideal来自于：the observation that graph data can preserve their semantics well during encoder perturbations。那为什么不直接去掉图增强，用两个pertubated graph encoder的输出作为对比对象呢？（方法很讨巧）

本文工作： 1. 首先，作者提出了SimGRACE，一种不需要图增强的GCL框架，并给出理论解释；2. 其次，还设计了一个对抗训练版本AT-SimGRACE，进一步提高模型鲁棒性，并给出了理论解释。

2. 方法

模型细节

Encoder perturbation

$f(\cdot;\theta)$ 表示GNN编码器， $f(\cdot;\theta')$ 表示其扰动版本， $h$ 和 $h'$ 表示两个编码器学习到的节点嵌入。扰动编码器参数计算方式如下：
$\theta_{l}^{\prime}=\theta_{l}+\eta \cdot \Delta \theta_{l} ; \quad \Delta \theta_{l} \sim \mathcal{N}\left(0, \sigma_{l}^{2}\right)$
其实就是将原始GNN每一层的参数加上一个高斯噪声。
Projection head

这个属于常规操作，将节点嵌入用一个非线性函数映射到对比空间。
$z=g(\mathrm{~h}), z^{\prime}=g\left(\mathrm{~h}^{\prime}\right)$
Contrastive loss

这个也很常规，和其他GCL方法没什么区别：
$\ell_{n}=-\log \frac{\left.\exp \left(\operatorname{sim}\left(z_{n}, z_{n}^{\prime}\right)\right) / \tau\right)}{\sum_{n^{\prime}=1, n^{\prime} \neq n}^{N} \exp \left(\operatorname{sim}\left(z_{n}, z_{n^{\prime}}\right) / \tau\right)}$

理论证明

作者基于“ Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere”这篇文章，尝试解释SIMGRACE为什么有效。在这篇文章里，作者给出了两个评价对比学习获取到的节点嵌入质量的指标：alignment和uniformity。

alignment：正样本对之间的距离
$\ell_{\text {align }}(f ; \alpha) \triangleq \underset{(x, y) \sim p_{\text {pos }}}{\mathbb{E}}\left[\|f(x)-f(y)\|_{2}^{\alpha}\right], \quad \alpha>0$
其中 $p_{pos}$ 表示所有的正样本对。这个指标其实适合对比学习的目标相对应的，即正样本在嵌入空间中的距离应该很近。

在SimGRACE里面，alignment计算方式可以转换成：
$\ell_{\text {align }}(f ; \alpha) \triangleq \mathbb{E}_{x \sim p_{\text {data }}}\left[\left\|f(x ; \theta)-f\left(x ; \theta^{\prime}\right)\right\|_{2}^{\alpha}\right], \quad \alpha>0$
其中 $p_{data}$ 表示数据分布，其实就是输入图的所有节点。
uniform： the logarithm
of the average pairwise Gaussian potential
$\ell_{\text {uniform }}(f ; \alpha) \triangleq \log \underset{x, y^{i . i . d .} p_{\text {data }}}{\mathbb{E}}\left[e^{-t\|f(x ; \theta)-f(y ; \theta)\|_{2}^{2}}\right]$
uniform对应于对比学习的另一个目标：随机样本的嵌入应该分散在嵌入空间中。

作者在训练过程中每2个epochs，记录下 $l_{align}$ 和 $l_{uniform}$ 值。作者对比了SimGRACE、GraphCL和MoCL三种方法，如下图所示：

从上图可以看出，GraphCL的alignment和uniformity值要大于SimGRACE和MoCL，这说明他不能让正样本对保持比较近的距离（因为图增强破坏了原始图的语义信息）。相反，MoCL通过引入领域知识作为指导来选取增强策略，可以在增强过程中保留语义信息，因此alignment指标下，表现较好。但是在uniformity指标下，其表现不如SimGRACE，因此最终的表现也不如SimGRACE。