Bringing Your Own View: Graph Contrastive Learning without Prefabricated Data Augmentations

2022-06-10

字数统计: 2.8k | 阅读时长≈ 14 分钟

https://github.com/Shen-Lab/GraphCL_Automated

Bringing Your Own View: Graph Contrastive Learning without Prefabricated Data Augmentations ，2022，WSDM

1. 简介

1.1 摘要

Self-supervision is recently surging at its new frontier of graph learning. It facilitates graph representations beneficial to down-stream tasks; but its success could hinge on domain knowledge for handcraft or the often expensive trials and errors. Even its state-of-the-art representative, graph contrastive learning (GraphCL), is not completely free of those needs as GraphCL uses a prefabricated prior reflected by the ad-hoc manual selection of graph data augmentations. Our work aims at advancing GraphCL by answering the following questions: How to represent the space of graph augmented views? What principle can be relied upon to learn a prior in that space? And what framework can be constructed to learn the prior in tandem with contrastive learning? Accordingly, we have extended the prefabricated discrete prior in the augmentation set, to a learnable continuous prior in the parameter space of graph generators, assuming that graph priors per se, similar to the concept of image manifolds, can be learned by data generation. Furthermore, to form contrastive views without collapsing to trivial solutions due to the prior learnability, we have lever-aged both principles of information minimization (InfoMin) and information bottleneck (InfoBN) to regularize the learned priors. Eventually, contrastive learning, InfoMin, and InfoBN are incorporated organically into one framework of bi-level optimization. Our principled and automated approach has proven to be competitive against the state-of-the-art graph self-supervision methods, including GraphCL, on benchmarks of small graphs; and shown even better generalizability on large-scale graphs, without resorting to human expertise or downstream validation. Our code is publicly released at https://github.com/Shen-Lab/GraphCL_Automated.

近来自监督在图学习领域火速发展。它有利于下游任务中的图表示学习，但是他的成功依赖于一些手工的领域知识或者代价昂贵的试错。尽管是SOTA方法，GraphCL也存在这些问题，它基于一些先验知识来选择图增强方法。我们的工作致力于通过回答下列问题来提高GraphCL：如何表示图增强视角空间？在这个空间可以依赖哪些原则来学习先验知识？在对比学习的同时可以构建什么样的框架来学习先验知识？因此，我们假设图先验知识类似于image manifolds的概念，可以通过data generation来学习，并将增强集合中预制的离散先验知识拓展成图生成器参数空间中可学习的连续知识。另外，由于先验知识的可学习性，生成对比视角不会崩溃成琐碎的解决方案，我们同时利用InfoMin和InfoBN来正则化学习到的先验知识。最后，将对比学习、infoMin和infoBN有机地整合到一个双侧优化框架中。在small graphs的标准数据集上，我们的方法已经被证明和SOTA 图自监督方法相比，具有竞争力，在大规模数据集上的泛化能力也更好。

1.2 本文工作

背景：现有自监督方法并没有完全解决图结构数据中异质性的挑战，它们的成功依赖于基于领域知识，去精心设计predictive pretext任务。也就是说，以前那些方法为了完成某种任务，比如上下文预测、元路径提取、图补全等任务，会去精心设计自己的模型，通用性不高。最近比较火的图对比学习方法一定程度上解决了这个问题，但是GCL也存在依赖人工先验知识的部分，即图增强。

动机：“ help close the gap is to turn the prefabricated self-supervised prior into a learnable one”，将依赖人工先验知识的图增强变成可学习的。

本文工作：“ What is the space, principle and framework that one can rely on, to define and pursue the learnable self-supervised prior ”，解决“用什么样的空间、原则、框架来定义和实现可学习自监督先验知识”这一问题。具体来说：

定义了一个参数化可学习先验函数
用InfoMin和InfoBN来regularize生成器的优化。

2. 方法

先回顾下GraphCL的整体框架：

2.1 对prior进行learnable 建模

在GCL中唯一用到预定义先验的地方就是图增强那块，因此对prior进行建模，其实就是对图增强进行可学习建模。最近图生成模型上升速度很快，给图先验知识参数化提供了一种平滑的解决方案。

本文作者利用VGAE，对GCL中的图增强进行建模。

这里也就是本文和其他Automated GCL方法核心区别所在，之前的自适应增强方法，都是预定义一个增强策略集合，然后从这个集合里面选取最优组合。而本文直接使用图生成模型来生成增强视角。

然后接下来的一个问题就是如何将可学习先验模型和对比模型结合到一起。一种直接的方式就是双层优化：

\begin{aligned} &\min _{\theta} \mathbb{E}_{P_{\mathrm{G}}} \mathcal{L}_{\mathrm{CL}}\left(\mathrm{G}, \phi_{1}, \phi_{2}, \theta\right) \\ &\text { s.t. } \phi_{1}, \phi_{2} \in \arg \min _{\phi_{1}^{\prime}, \phi_{2}^{\prime}} \mathbb{E}_{\mathrm{PG}_{\mathrm{G}}}\left\{\mathcal{L}_{\mathrm{Gen}}\left(\mathrm{G}, \phi_{1}^{\prime}\right)+\mathcal{L}_{\mathrm{Gen}}\left(\mathrm{G}, \phi_{2}^{\prime}\right)\right\} \end{aligned}

此时两部分模型相当于各搞各的，独立优化，上下层之间没有交互，容易出现平凡解。为了解决这个问题，作者向图生成模型中添加了一个“reward”信号，这个信号来自于上层的对比模型。这样新的双层优化模型公式如下：

\begin{aligned} \min _{\theta} & \mathbb{E}_{P_{\mathrm{G}}} \mathcal{L}_{\mathrm{CL}}\left(\mathrm{G}, \phi_{1}, \phi_{2}, \theta\right) \\ \text { s.t. } & \phi_{1}, \phi_{2} \in \arg \min _{\phi_{1}^{\prime}, \phi_{2}^{\prime}} \mathbb{E}_{\mathbb{P}_{\mathrm{G}}} r\left(\mathrm{G}, \phi_{1}^{\prime}, \phi_{2}^{\prime}, \theta\right)\left\{\mathcal{L}_{\mathrm{Gen}}\left(\mathrm{G}, \phi_{1}^{\prime}\right)+\mathcal{L}_{\mathrm{Gen}}\left(\mathrm{G}, \phi_{2}^{\prime}\right)\right\} \end{aligned}

其中优化信号计算方式为： $r\left(\mathrm{G}, \phi_{1}, \phi_{2}, \theta\right)=\left\{\begin{array}{cc} 1, & \text { given some condition } \\ \delta \ll 1, & \text { otherwise } \end{array}\right.$ 。整个模型架构如下图所示：

2.2 具体实现

在2.1中介绍了模型的整体框架，和双层优化公式，但是并没有具体实现（reward signal）。

可以看到，为了具体实现前文提到的模型，作者提出了两个Principle：InfoBN和InfoMin

InfoMin Principle

和其他GCL方法一样，最小化两个视角之间的互信息，作者这里采用GraphCL中的对比损失：
$\begin{aligned} & \min _{\theta} \mathbb{E P}_{\mathrm{G}} \mathcal{L}_{\mathrm{CL}}\left(\mathrm{G}, \mathrm{A}_{1}, \mathrm{~A}_{2}, \theta\right) \\ =& \min _{\theta} \mathbb{E P}_{\mathrm{G}}\left\{-\mathbb{E P}_{\left(\mathrm{A}_{1}, \mathrm{~A}_{2}\right)} \operatorname{sim}(\overbrace{\left.\mathrm{T}_{\theta, 1}(\mathrm{G}), \mathrm{T}_{\theta, 2}(\mathrm{G})\right)}^{\text {positive pairs }}\right.\\ &+\mathbb{E}_{\mathrm{P}_{\mathrm{A}_{1}}} \log \left(\mathrm{EP}_{\mathrm{G}^{\prime}} \times \mathrm{P}_{\mathrm{A}_{2}} \exp \left(\operatorname{sim}(\underbrace{\left.\mathrm{T}_{\theta, 1}(\mathrm{G}), \mathrm{T}_{\theta, 2}\left(\mathrm{G}^{\prime}\right)\right)}_{\text {negative pairs }})\right)\right\}, \end{aligned}$
作者定义InfoMin下的reward signal为： $r_{\operatorname{InfoMin}}\left(\mathrm{G}, \phi_{1}\right.\left.\phi_{2}, \theta\right)=\left\{\begin{array}{cc} 1, & \text { if } \mathcal{L}_{\mathrm{CL}}\left(\mathrm{G}, \phi_{1}, \phi_{2}, \theta\right)>\text { threshold } \\ \delta \ll 1, & \text { otherwise } \end{array}\right.$ 。
InfoBN Principle

作者通过减少每个对比视图及其潜在表示之间的信息重叠来引入InfoBN：
$\begin{aligned} &\mathcal{L}_{\mathrm{InfoBN}}\left(\mathrm{G}, \phi_{1}, \phi_{2}, \theta, \pi\right)= \\ &-\operatorname{sim}\left(\mathrm{T}_{\pi, \phi_{1}}(\mathrm{G}), \mathrm{T}_{\theta, \phi_{1}}(\mathrm{G})\right)+\log \left(\mathbb{E}_{\mathrm{P}_{\mathrm{G}^{\prime}}} \exp \left(\operatorname{sim}\left(\mathrm{T}_{\pi, \phi_{1}}(\mathrm{G}), \mathrm{T}_{\theta, \phi_{1}}\left(\mathrm{G}^{\prime}\right)\right)\right)\right) \\ &-\operatorname{sim}\left(\mathrm{T}_{\pi, \phi_{2}}(\mathrm{G}), \mathrm{T}_{\theta, \phi_{2}}(\mathrm{G})\right)+\log \left(\mathbb{E}_{\mathrm{P}_{\mathrm{G}^{\prime}}} \exp \left(\operatorname{sim}\left(\mathrm{T}_{\pi, \phi_{2}}(\mathrm{G}), \mathrm{T}_{\theta, \phi_{2}}\left(\mathrm{G}^{\prime}\right)\right)\right)\right) \end{aligned}$
因此，InfoBN的reward signal可以定义为 $r_{\operatorname{InfoBN}}\left(\mathrm{G}, \phi_{1}, \phi_{2}, \theta, \pi\right)=\left\{\begin{array}{cc} 1, & \text { if } \mathcal{L}_{\ln f o B N}\left(C, \phi_{1}, \phi_{2}, \theta, \pi\right)>\text { threshold } \\ \delta \Leftrightarrow 1, & \text { otherwise } \end{array}\right.$ 。

这样带有InfoBN-reward learned prior的GraphCL可以写成：
$\begin{aligned} &\min _{\theta} \mathbb{E}_{P_{\mathrm{G}}} \mathcal{L}_{\mathrm{CL}}\left(\mathrm{G}, \phi_{1}, \phi_{2}, \theta\right), \\ &\text { s.t. } \phi_{1}, \phi_{2} \in \arg \min _{\phi_{1}^{\prime}, \phi_{2}^{\prime}} \mathbb{E}_{\mathrm{P}_{\mathrm{G}}} r_{\operatorname{InfoBN}}\left(\mathrm{G}, \phi_{1}^{\prime}, \phi_{2}^{\prime}, \theta, \pi\right)\left\{\mathcal{L}_{\mathrm{Gen}}\left(\mathrm{G}, \phi_{1}^{\prime}\right)\right. \\ &\left.+\mathcal{L}_{\mathrm{Gen}}\left(\mathrm{G}, \phi_{2}^{\prime}\right)\right\}, \pi \in \arg \min _{\pi^{\prime}} \mathbb{E}_{\mathrm{P}_{\mathrm{G}}} \mathcal{L}_{\mathrm{InfoBN}}\left(\mathrm{G}, \phi_{1}, \phi_{2}, \theta, \pi^{\prime}\right) . \end{aligned}$
Mixed

最后作者提出了一种混合模式，即将InfoMin和InfoBN整合到一起，这样reward signal定义为：
$\begin{aligned} &r_{\operatorname{Info}(\operatorname{Min} \& \mathrm{BN})}\left(\mathrm{G}, \phi_{1}, \phi_{2}, \theta, \pi\right)=\\ &\left\{\begin{array}{c} 1, \\ \delta \ll 1, \end{array} \text { if } \gamma \mathcal{L}_{\mathrm{CL}}\left(\mathrm{G}, \phi_{1}, \phi_{2}, \theta\right)+(1-\gamma) \mathcal{L}_{\text {lnfoBN }}\left(\mathrm{G}, \phi_{1}, \phi_{2}, \theta, \pi\right)>\right.\text { threshold } \end{aligned}$