In this section, we provide a theoretical justification for the effectiveness of GraphArc in enhancing Out-of-Distribution (OOD) detection. Our analysis focuses on two key properties that have been shown to correlate strongly with OOD detection robustness: intra-class variation and inter-class separation (Ming et al., 2022). We demonstrate that GraphArc explicitly minimizes intra-class variation and enlarges inter-class separation through its angular optimization and scaling design.
Proposition 1 (Reduced Intra-Class Variation).
Given normalized feature vectors and angular-based optimization, GraphArc minimizes intra-class variation under a cosine similarity metric.
Proof.
The normalized feature is given by
$$
\tilde{h}_i^{(k)} = \frac{h_i^{(k)}}{||h_i^{(k)}||}
$$,
and the normalized class center is
$$ \tilde{W}{y_i} = \frac{W{y_i}}{||W_{y_i}||} $$.
The angular softmax loss is defined as:
This loss maximizes
This reduction in intra-class angular variance improves feature compactness and the reliability of confidence calibration under distribution shift.
Proposition 2 (Increased Inter-Class Separation).
Given a Lipschitz continuous feature mapping $\phi$, GraphArc increases the inter-class separation through its use of weight normalization and a scaling factor $s$.
Proof Sketch.
Consider the final layer representation of a GNN:
For two node embeddings \( \phi(x_i), \phi(x_j) \) from different classes, their logits are:
Then the angular margin between them becomes:
As \( s \) increases, the separation in logit space is amplified. Assuming \( \phi \) is Lipschitz continuous with constant \( L \), we have:
Thus, GraphArc provides a theoretical lower bound on the inter-class margin, proportional to the scaling factor \( s \).
According to Ming et al. (2022), robust OOD detection is facilitated by minimizing intra-class variation \( \mathcal{V}(\phi, \mathcal{E}) \) and maximizing inter-class separation \( \mathcal{I}_\rho(\phi, \mathcal{E}) \). GraphArc satisfies both:
- ✅ Normalization aligns same-class features to a shared direction, minimizing \( \mathcal{V}(\phi, \mathcal{E}) \).
- ✅ Scaling widens angular margins and improves \( \mathcal{I}_\rho(\phi, \mathcal{E}) \).
These properties help GNNs better distinguish OOD samples and generate more calibrated confidence scores.
好多公式不能渲染