Official source code for the paper "Adaptive Graph Enhancement for Imbalanced Multi-relation Graph Learning" (WSDM'25)

Fig. 1: The overall framework of AD-GSMOTE: (a) It first leverages the degree filter to select tail (blue) nodes for minority classes (black nodes) and then generates the adaptive center node (brown node 10) based on the correctly classified nodes from the previous training process. It then designs the adaptive node generator to generate synthetic nodes (brown) for tail nodes (e.g., node 1' is generated by node 1, the center node 10, and the similar node 5.) and the triadic edge generator to create connections (i.e.,
We use conda for environment setup. Please run the following command to install the required packages.
conda create -n AD-GSMOTE python=3.11
conda activate AD-GSMOTE
pip install -r requirements.txt
pip install pyg_lib torch_scatter torch_sparse -f https://data.pyg.org/whl/torch-2.1.2+cu121.html
We introduce a new multi-relation graph dataset, called Twitter-Drug, which is adapted from our previous studies HyGCL-DC and HyGCL-AdT. While these prior works focused on hypergraphs, we adapt it into a multi-relation graph dataset that models real-world social media interactions through pairwise relationships. Our focus is on identifying drug-related user roles in a highly imbalanced setting where drug users represent a very small minority of the total user population.
The dataset captures both class imbalance (few positive drug users) and topology imbalance challenges. It consists of three types of pairwise relations between users:
- User-Tweet-User (U-T-U): Connections formed through tweet interactions (replies, quotes, retweets, likes)
- User-Follow-User (U-F-U): Connections based on following/follower relationships
- User-Keyword-User (U-K-U): Connections between users whose tweets share common keywords
More detail about the dataset collection and annotation can be found in README.
You can obtain our data from link.
Here we summarize the datasets and the code for the Twitter Series Dataset.
Dataset | Graph Type | Label Type | Link |
---|---|---|---|
Twitter-Drug | Multi-relation Graph | Drug User Roles | Link |
Twitter-HyDrug-Comm | Hypergraph | Drug User Roles | Link |
Twitter-HyDrug-Role | Hypergraph | Drug Communities | Link |
cd AD-GSMOTE
python src/main.py --dataset twitter_drug --train_ratio 0.4 --load_best_params
We provide the best parameters for each dataset {twitter_drug, yelpchi, amazon} with train ratio {0.05, 0.4} in config/dataset_params.yaml
. You can directly load the best parameters by setting the --load_best_params
flag.
The overall structure of the project is as follows:
├── config/ # Configuration files
├── data/ # Dataset files and README
| |-- twitter_drug/
| |-- yelpchi/
| |-- amazon/
| |-- README.md
├── src/ # Source code
| |-- models/ # Model implementations
| |-- utils/ # Utility functions
| |-- main.py # Main training script
├── requirements.txt # Dependencies
└── README.md # Project documentation
To download the dataset, please download the files [here] and unzip them into the data
folder.
The statistics of dataset employed in our paper are shown in Table 1.
Table 1: Statistics of employed datasets.
Dataset | # of nodes | CIR | TIR | Relation | # of relation |
---|---|---|---|---|---|
U-T-U | 392,190 | ||||
Twitter-Drug | 27,945 | 56.3 : 4.4 : 1.2 : 1.0 | 1.0 : 37.4 : 29.1 : 21.6 | U-F-U | 69,675 |
U-K-U | 253,602 | ||||
R-U-R | 49,315 | ||||
YelpChi | 45,954 | 5.9 : 1.0 | 1.1 : 1.0 | R-S-R | 3,402,743 |
R-T-R | 573,616 | ||||
U-P-U | 175,608 | ||||
Amazon | 11,944 | 13.5 : 1.0 | 2.9 : 1.0 | U-S-U | 3,566,479 |
U-V-U | 1,036,737 |
This is a sample running logger which records the output and the model performance for Twitter-Drug data with training split ratio 0.4:
Epoch: 0021, Train Loss: 2.30, Val Loss: 2.25, Train F1: 52.54%, Val F1: 58.93%, Train GMean: 34.36%, Val GMean: 40.34%, Test F1: 55.40%, Test GMean: 35.74%
Epoch: 0041, Train Loss: 1.80, Val Loss: 1.73, Train F1: 64.70%, Val F1: 66.34%, Train GMean: 52.36%, Val GMean: 52.95%, Test F1: 62.33%, Test GMean: 47.77%
Epoch: 0061, Train Loss: 1.41, Val Loss: 1.35, Train F1: 69.95%, Val F1: 69.18%, Train GMean: 59.12%, Val GMean: 57.87%, Test F1: 67.18%, Test GMean: 54.81%
Epoch: 0081, Train Loss: 1.20, Val Loss: 1.11, Train F1: 73.37%, Val F1: 71.94%, Train GMean: 63.29%, Val GMean: 63.06%, Test F1: 69.47%, Test GMean: 58.22%
Epoch: 0101, Train Loss: 1.01, Val Loss: 0.96, Train F1: 77.07%, Val F1: 73.38%, Train GMean: 68.43%, Val GMean: 64.59%, Test F1: 70.77%, Test GMean: 60.12%
Epoch: 0121, Train Loss: 0.92, Val Loss: 0.86, Train F1: 78.99%, Val F1: 74.75%, Train GMean: 70.48%, Val GMean: 66.14%, Test F1: 72.79%, Test GMean: 63.85%
Epoch: 0141, Train Loss: 0.84, Val Loss: 0.78, Train F1: 81.69%, Val F1: 73.41%, Train GMean: 75.07%, Val GMean: 63.03%, Test F1: 71.54%, Test GMean: 59.95%
Epoch: 0161, Train Loss: 0.79, Val Loss: 0.73, Train F1: 82.40%, Val F1: 71.62%, Train GMean: 75.49%, Val GMean: 60.10%, Test F1: 71.99%, Test GMean: 60.29%
Epoch: 0181, Train Loss: 0.75, Val Loss: 0.68, Train F1: 83.86%, Val F1: 73.00%, Train GMean: 77.41%, Val GMean: 63.67%, Test F1: 73.36%, Test GMean: 63.94%
Epoch: 0201, Train Loss: 0.71, Val Loss: 0.64, Train F1: 84.54%, Val F1: 72.49%, Train GMean: 78.29%, Val GMean: 62.59%, Test F1: 73.25%, Test GMean: 63.46%
Epoch: 0221, Train Loss: 0.69, Val Loss: 0.61, Train F1: 84.11%, Val F1: 72.58%, Train GMean: 78.34%, Val GMean: 61.81%, Test F1: 73.33%, Test GMean: 63.58%
Epoch: 0241, Train Loss: 0.65, Val Loss: 0.59, Train F1: 86.45%, Val F1: 72.98%, Train GMean: 80.41%, Val GMean: 62.62%, Test F1: 73.80%, Test GMean: 64.05%
Epoch: 0261, Train Loss: 0.63, Val Loss: 0.56, Train F1: 86.00%, Val F1: 73.47%, Train GMean: 80.36%, Val GMean: 64.71%, Test F1: 74.25%, Test GMean: 65.27%
Epoch: 0281, Train Loss: 0.60, Val Loss: 0.54, Train F1: 87.31%, Val F1: 71.49%, Train GMean: 81.93%, Val GMean: 60.14%, Test F1: 73.35%, Test GMean: 63.28%
Epoch: 0301, Train Loss: 0.59, Val Loss: 0.52, Train F1: 87.27%, Val F1: 72.47%, Train GMean: 81.67%, Val GMean: 63.50%, Test F1: 74.87%, Test GMean: 66.84%
Epoch: 0321, Train Loss: 0.57, Val Loss: 0.50, Train F1: 88.95%, Val F1: 73.20%, Train GMean: 84.63%, Val GMean: 63.93%, Test F1: 74.74%, Test GMean: 66.44%
Epoch: 0341, Train Loss: 0.55, Val Loss: 0.48, Train F1: 89.91%, Val F1: 73.40%, Train GMean: 85.23%, Val GMean: 63.66%, Test F1: 75.47%, Test GMean: 66.65%
Epoch: 0361, Train Loss: 0.53, Val Loss: 0.47, Train F1: 90.39%, Val F1: 72.43%, Train GMean: 86.73%, Val GMean: 61.99%, Test F1: 73.88%, Test GMean: 64.50%
Epoch: 0381, Train Loss: 0.53, Val Loss: 0.45, Train F1: 89.71%, Val F1: 72.58%, Train GMean: 85.00%, Val GMean: 63.52%, Test F1: 74.46%, Test GMean: 65.32%
Epoch: 0401, Train Loss: 0.50, Val Loss: 0.44, Train F1: 91.15%, Val F1: 71.97%, Train GMean: 86.75%, Val GMean: 62.12%, Test F1: 74.28%, Test GMean: 65.32%
Epoch: 0421, Train Loss: 0.49, Val Loss: 0.43, Train F1: 91.45%, Val F1: 71.38%, Train GMean: 88.07%, Val GMean: 61.44%, Test F1: 73.86%, Test GMean: 64.78%
Epoch: 0441, Train Loss: 0.47, Val Loss: 0.41, Train F1: 92.27%, Val F1: 72.65%, Train GMean: 88.91%, Val GMean: 63.39%, Test F1: 74.35%, Test GMean: 65.51%
Epoch: 0461, Train Loss: 0.47, Val Loss: 0.41, Train F1: 92.43%, Val F1: 72.85%, Train GMean: 89.86%, Val GMean: 60.91%, Test F1: 73.10%, Test GMean: 61.71%
Epoch: 0481, Train Loss: 0.46, Val Loss: 0.41, Train F1: 91.73%, Val F1: 74.18%, Train GMean: 87.81%, Val GMean: 66.91%, Test F1: 75.06%, Test GMean: 67.75%
Best Test F1: 75.54%, Best Test GMean: 68.92%
Total time: 37.63 seconds
Yiyue Qian - yyqian5@gmail.com
Tianyi (Billy) Ma - tma2@nd.edu
Discussions, suggestions and questions are always welcome!
@inproceedings{qian2025adgsmote,
title={Adaptive Graph Enhancement for Imbalanced Multi-relation Graph Learning},
author={Qian, Yiyue and Ma, Tianyi and Zhang, Chuxu and Ye, Yanfang},
booktitle={WSDM},
year={2025}
}