DSE Graph Foundation Model Subgroup

Introduction

Greetings! We are a tiny research group from the Data Science and Engineering (DSE) Laboratory at Michigan State University. We typically focus on the Graph Foundation Models (GFM). Our perspectives are as follows: (1) LLM can be one choice for building GFM, but not yet. (2) GFM requires the guidance from theoretical principles. This is exciting as it connects the advanced progress from theory to unbeatable empirical success (Check details here). (3) There is an initial spark for Neural scaling law on graphs. We need more high-quality data, a better model backbone, and a better pre-training task design toward scaling. (4) the most important thing for GFM is the correct application scenario. Beyond traditional graph topics in the data mining domain, we are also interested in the potential of the utilization of GFM in other domains. Check more details on our current progress including papers, talks, open-source repository, and reading list.

Papers

Prospective

Graph Foundation Models
Haitao Mao*, Zhikai Chen*, Wenzhuo Tang, Jianan Zhao, Yao Ma, Tong Zhao, Neil Shah, Michael Galkin, Jiliang Tang;
ICML 2024 Spotlight
Details
- We propose a “graph vocabulary” perspective aiming to find the basic transferable units underlying graphs that encode the invariance of graphs
- We illustrate theoretical guidance for the graph vocabulary design
- We emphasize the practical techniques for building GFM following the Neural Scaling Law

GFMs

A Pure Transformer Pretraining Framework on Text-attributed Graphs.
Yu Song, Haitao Mao, Jiachen Xiao, Jingzhe Liu, Zhikai Chen, Wei Jin, Carl Yang, Jiliang Tang, Hui Liu;
preprint, 2024
Cross-Domain Graph Data Scaling: A Showcase with Diffusion Models.
Wenzhuo Tang, Haitao Mao, Danial Dervovic, Ivan Brugere, Saumitra Mishra, Yuying Xie, Jiliang Tang.;
preprint, 2024
Universal link predictor by in-context learning.
Kaiwen Dong, Haitao Mao, Zhichun Guo, Nitesh Chewla;
preprint, 2024

Principles

Do Neural Scaling Laws exist on Graph Self-Supervised Learning
Qian Ma, Haitao Mao, Zhehua Zhang, Chunlin Feng, Jingzhe Liu, Yu Song, Yao Ma;
preprint, 2024
Neural Scaling Laws on Graphs
Jingzhe Liu, Haitao Mao, Zhikai Chen, Tong Zhao, Neil Shah, Jiliang Tang;
preprint, 2024
Details
- We examine the mode and data scaling laws on graphs.
- For model scaling, we observe some graph-specific phenomena and identify the potential reasons.
- For data scaling, we propose that the total edge number is a better metric, and extend the data scaling law to node classification and link prediction tasks.
A Data Generation Perspective to the Mechanism of In-Context Learning
Haitao Mao, Guangliang Liu, Yao Ma, Rongrong Wang, Jiliang Tang;
preprint, 2024
Details
- We study the underlying mechanism of ICL from a data generation perspective.
- we rigorously adopt the terms of skill learning and skill recognition. The difference between them is skill learning can learn new data generation functions from in-context data.
- We illustrate two analysis frameworks, i.e., Bayesian inference statistical framework and function learning statistical framework.
Revisiting Link Prediction: A data perspective
Haitao Mao, Juanhui Li, Harry Shomer, Bingheng Li, Wenqi Fan, Yao Ma, Tong Zhao, Neil Shah, Jiliang Tang;
ICLR, 2024
Details
- We recognize three fundamental factors critical to link prediction: local structural proximity, global structural proximity and feature proximity.
- We unearth the incompatibility between feature and structural proximity.
- We collect diverse link prediction datasets and provide new guidance for model architecture design.
Demystifying Structural Disparity in Graph Neural Networks: Can One Size Fit All?
Haitao Mao, Zhikai Chen, Wei Jin, Haoyu Han, Yao Ma, Tong Zhao, Neil Shah, Jiliang Tang; NeurIPS, 2023
Details
- We recognize two fundamental factors critical to node classification: homophily and heterophily.
- GNNs can only work on either the homophily pattern or the heterophily one, but not both.

LLMs on Graphs

Label-free Node Classification on Graphs with Large Language Models (LLMS)
Zhikai Chen, Haitao Mao, Hongzhi Wen, Haoyu Han, Wei Jin, Haiyang Zhang, Hui Liu, Jiliang Tang;
ICLR, 2024
Details
- In recent years, there have been remarkable advancements in node classification achieved by Graph Neural Networks (GNNs). However, they necessitate abundant high-quality labels to ensure promising performance. In contrast, Large Language Models (LLMs) exhibit impressive zero-shot proficiency on text-attributed graphs. Yet, they face challenges in efficiently processing structural data and suffer from high inference costs. In light of these observations, this work introduces a label-free node classification on graphs with LLMs pipeline, LLM-GNN. It amalgamates the strengths of both GNNs and LLMs while mitigating their limitations.
Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs
Zhikai Chen, Haitao Mao, Hang Li, Wei Jin, Hongzhi Wen, Xiaochi Wei, Shuaiqiang Wang, Dawei Yin, Wenqi Fan, Hui Liu, Jiliang Tang;
SIGKDD Explorations and NeurIPS GLFrontiers 2023 [codes]
Details
- In this paper, we study how LLMs can be used to empower graph machine learning problems. For node classification tasks, we propose two pipelines: LLMs-as-Enhancers and LLMs-as-Predictors. LLMs-as-Enhancers adopts LLMs to enhance the text features, which improves GNNs' performance. LLMs-as-Predictors directly adopts LLMs for inference, and present feature information together with inductive biases by natural languages. LLMs-as-Predictors achieves promising zero-shot performance.
Graph Machine Learning in the Era of Large Language Models (LLMs).
Wenqi Fan, Shijie Wang, Jiani Huang, Zhikai Chen, Yu Song, Wenzhuo Tang, Haitao Mao, Hui Liu, Xiaorui Liu, Dawei Yin, Qing Li;
Arxiv 2024

Benchmarks

Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking
Juanhui Li, Harry Shomer, Haitao Mao, Shenglai Zeng, Yao Ma, Neil Shah, Jiliang Tang, Dawei Yin;
NeurIPS Dataset Track, 2023
Text-space graph foundation models: a comprehensive benchmark and new insights.
Zhikai Chen, Haitao Mao, Jingzhe Liu, Yu Song, Bingheng Li, Wei Jin, Bahare Fatemi, Anton Tsitsulin, Bryan Perozzi, Hui Liu, Jiliang Tang;
Preprint 2024

Others

Source Free Graph Unsupervised Domain Adaptation
Haitao Mao, Lun Du, Yujia Zheng, Qiang Fu, Zelin Li, Xu Chen, Shi Han, Dongmei Zhang;
WSDM, 2024
LPFormer: An Adaptive Graph Transformer for Link Prediction
Harry Shomer, Yao Ma, Haitao Mao, Juanhui Li, Bo Wu, Jiliang Tang;
preprint, 2023