Simple unsupervised graph representation learning

2022-10-04

字数统计: 2.3k | 阅读时长≈ 10 分钟

https://ojs.aaai.org/index.php/AAAI/article/view/20748/20507

Simple unsupervised graph representation learning，2022，AAAI

总结：现有的GCL方法通常时间复杂度比较高，导致难以应用到大规模数据集，主要有一下3点原因：（1）数据增强；（2）高维嵌入；（3）损失函数。本文作者提出了一种简单高效的图对比学习方法SUGRL，作者删掉了GCL中的数据增强和损失函数，并且重新设计了一个boundary loss。和现有GCL方法相比，SUGRL训练速度确实非常快，并且在节点分类任务上也具有不错的性能。总的来说，这篇文章还是不错的，值得借鉴，如果有更多关于SUGRL的理论分析和证明就更好了。

1. 简介

1.1 摘要

In this paper, we propose a simple unsupervised graph representation learning method to conduct effective and efficient contrastive learning. Specifically, the proposed multiplet loss explores the complementary information between the structural information and neighbor information to enlarge the inter-class variation, as well as adds an upper bound loss to achieve the finite distance between positive embeddings and anchor embeddings for reducing the intra-class variation. As a result, both enlarging inter-class variation and reducing intra-class variation result in small generalization error, thereby obtaining an effective model. Furthermore, our method removes widely used data augmentation and discriminator from previous graph contrastive learning methods, meanwhile available to output low-dimensional embeddings, leading to an efficient model. Experimental results on various real-world datasets show the effectiveness and efficiency of our method, compared to state-of-the-art methods. Our code is released at https://github.com/YujieMo/SUGRL .

本文我们提出了一种简单的无监督图表示学习方法，来进行有效、高效的对比学习。具体来说，我们提出的multiplet损失通过挖掘结构信息和邻域信息之间的互补信息来扩大类间差异，同时通过增加一个上限损失来实现正样本和锚点样本之间的有限距离，用于减小类内差异。因此，通过扩大类间差异，缩小类内差异，可以减小generalization error，从而得到一个有效模型。另外，我们的方法移除了之前GCL方法中的数据增强和discriminator，同时可以用来输出低维嵌入，从而使得模型更高效。现实世界的各种数据集上的实验表明，和SOTA方法相比，我们的方法是有效且高效的。

（吐槽一下，文章摘要写的太烂了！语言不通，信息也不凝练，看完不知道这篇工作干了啥。）

1.2 本文工作

背景： 随着GNN的广泛应用，不需要大量标签数据的无监督图表示学习方法得到了越来越多的关注。对比学习作为其中一种，通过最大化输入和相关输入之间的互信息来学习节点表示。现有的对比学习方法虽然在很多引用中都取得了不错的表现，但是这些方法通常依赖于数据增强，这增加了模型训练的复杂度。因此这些方法通常很难应用于大规模数据集。

动机： 作者认为现有GCL方法scalability比较弱的原因有三个：数据增强、高维嵌入和对比损失。比如在GRACE和GML的训练过程中，通常可能要花费20%-40%的时间进行数据增强。作者进行了如下实验：

因此，寻找一种具有low computation cost和high-quality representation的无监督图表示学习方法是很有意思的。

本文工作： 作者提出了一种新的图对比学习方法SUGRL，高效且可扩展。和现有GCL框架主要有两点不同：

无需数据增强
- 首先使用MLP对节点原始特征进行语义编码，得到anchor embedding
- 使用GCN对输入图进行编码，得到positive embedding
- 对anchor embedding进行row-wise random permutation，得到negative embedding
对比损失

作者设计了一种新的multiplet loss，可以拉近anchor embeddings和positive embedding，同时拉远anchor embedding和negative embedding。

2. 方法

SUGRL框架如上图所示，正如前文所述，和常见GCL框架相比，主要有两点不同：

无需数据增强
对比损失不同

下面我们具体看看作者是如何获取正负样本对，以及如何设计损失函数的。

2.1 正负样本对的获取

这里作者的实现也非常简单，从前文框架图中我们也可以轻松看出作者的思路。

Anchor embedding

以节点特征 $X$ 未输入，使用MLP来获取anchor嵌入：
$\begin{gathered} \mathbf{X}^{(l+1)}=\text { Dropout }\left(\sigma\left(\mathbf{X}^{(l)} \mathbf{W}^{(l)}\right)\right) \\ \mathbf{H}=\mathbf{X}^{(l+1)} \mathbf{W}^{(l+1)} \end{gathered}$
其中 $\mathbf X^{(0)}=\mathbf X$ ， $\sigma$ 是激活函数， $\mathbf W^{(l)}$ 是第l层的权重参数。
Negative embedding

作者直接对Anchor embedding进行row-shuffle操作，获取负样本：
$\mathbf{H}^{-}=\operatorname{Shuffle}\left(\left[\mathbf{h}_1, \mathbf{h}_2, \ldots, \mathbf{h}_N\right]\right)$
Positive embedding

作者设计了两种类型正样本：structural embeddings和neighbor embeddings。
- Structural information
  
  其实就是使用GCN计算节点嵌入作为该类型正样本嵌入：
  $\mathbf{H}^{+^{(l+1)}}=\sigma\left(\widehat{\mathbf{A}} \mathbf{H}^{+^{(l)}} \mathbf{W}^{(l)}\right)$
  需要注意的是，作者为了进一步节约时间，这里的 $\mathbf W^{(l)}$ 和前面MLP中的权重是共享的。
- Neighbor information
  
  这与这种正样本，计算方法也很简单，直接从节点邻居节点采样，取平均值：
  $\widetilde{\mathbf{h}}_i^{+}=\frac{1}{m} \sum_{j=1}^m\left\{\mathbf{h}_j \mid v_j \in \mathcal{N}_i\right\}$

2.2 损失函数

根据前面的介绍，我们知道SUGRL中一共有四种embedding：锚点嵌入、结构信息正样本嵌入、邻域信息正样本嵌入和负样本嵌入。

那么作者是如何利用这四种嵌入设计损失函数的呢？

对比学习的目标是：拉大锚点和负样本之间的距离，减小锚点和正样本之间的距离。

现有的GCL方法，比如MVGRL、DGI等都是基于InfoNCE损失设计对比损失，从而实现这个目标。但是这些方法中的discriminator比较耗费时间。

另外减小泛化误差对于无监督图表示学习方法也很重要，它可以提高对比学习的泛化能力。有研究表明减小类内差异，增加类间差异可以有效降低泛化误差。

在SUGRL中，作者考虑使用三元损失作为整个模型损失的基础，即：

\alpha+d\left(\mathbf{h}, \mathbf{h}^{+}\right)<d\left(\mathbf{h}, \mathbf{h}^{-}\right)

其中 $d(\cdot,\cdot)$ 是相似度度量函数，比如L2范式， $\alpha$ 是一个非负值，保证正负样本之间有一个“safe” distance。这样我们就可以得到这样的损失函数：

\mathcal{L}_{\text {triplet }}=\frac{1}{k} \sum_{i=1}^k\left\{d\left(\mathbf{h}, \mathbf{h}^{+}\right)^2-d\left(\mathbf{h}, \mathbf{h}_i^{-}\right)^2+\alpha\right\}_{+}

其中 $\{\cdot\}_+=max\{\cdot, 0\}$ ，k是负样本数量。因为SUGRL有两种类型负样本，因此可以得到：

\begin{aligned} \mathcal{L}_S &=\frac{1}{k} \sum_{i=1}^k\left\{d\left(\mathbf{h}, \mathbf{h}^{+}\right)^2-d\left(\mathbf{h}, \mathbf{h}_i^{-}\right)^2+\alpha\right\}_{+}, \\ \mathcal{L}_N &=\frac{1}{k} \sum_{j=1}^k\left\{d\left(\mathbf{h}, \widetilde{\mathbf{h}}^{+}\right)^2-d\left(\mathbf{h}, \mathbf{h}_j^{-}\right)^2+\alpha\right\}_{+} . \end{aligned}

上面的这种损失函数，考虑了 $d\left(\mathbf{h}, \mathbf{h}^{+}\right)^2$ 和 $d\left(\mathbf{h}, \mathbf{h}_i^{-}\right)^2$ 之间的距离要大于（其实就是enlarge类间距离），但是没有考虑锚点和正样本之间的距离（其实就是reduce类内距离）。作者设计如下不等式：