Rules for Leaderboard Submissions

Here we present the rules of how we expect the community to use train/validation/test labels for the leaderboard submissions. The rules are designed to enforce the standardized experimental protocol for easy and direct model comparison. We acknowledge that these rules are by no means perfect but are a starting point for a standardized comparison for temporal graph learning. There are settings / use cases in temporal graph learning not currently covered in this work. We Hope to continue to expand TGB and improve its evaluation setting and procedures. Feedbacks from the community is highly welcome.

In the TGB paper, we mainly focus on the streaming setting . Please note that the test set information is only used for updating the memory module (if any) of a temporal graph learning method. Thus, no back-propagation is allowed based on the test set information. For further details, please check Appendix C in the paper.


The general rules are as follows.

  • Training Split: All data including edges, nodes, labels can be used in whatever ways to train the model parameters i.e. for graident-descent, model tuning, model input etc.
  • Validation Split: Meant for standard hyper-parameters tuning (not allowed: gradient-based search, graident-descent). Can be used for memory module.
  • Test Split: Final model evaluation, no hyper-parameters tuning allowed. Can be used for memory module