Skip to content

nobugpal/ACDC

Repository files navigation

ACDC

Unified Alert Correlation for Cyber-Physical Power Systems via Prototype-Guided Differentiable Clustering

Problem | Method | Results | Workload | Verification | Reproducibility Guide

This repository contains the reference implementation of ACDC, a graph-free framework that couples multi-view alert representation learning with prototype-guided differentiable clustering to correct temporal-semantic bias in cyber-physical SOC alert streams.

When using this code, please cite our paper (citation will be added upon publication).

3
public benchmarks
14.95M
SGCC alerts processed
231
incident clusters
0.847 ms
per-window latency
1,247
windows/s throughput

Problem Motivation

Two-stage encode-then-cluster pipelines overfit temporal adjacency under heavy interleaving, causing semantically distinct campaigns to collapse into mixed clusters. ACDC addresses this temporal-semantic bias by making clustering part of the learning objective instead of a post-hoc step on frozen embeddings.

Temporal-semantic bias under interleaved attack campaigns

Method Overview

ACDC aligns latent geometry with incident semantics rather than letting clustering inherit a prediction-shaped embedding space. The pipeline combines four complementary alert views, an unrolled Iterative Cross-Attention mechanism for differentiable prototype assignment, and jointly optimized heads for clustering-aware representation learning.

Overview of the ACDC pipeline from multi-view encoding through differentiable clustering and dual heads

Key Contributions

  • A unified alert-correlation framework that embeds prototype-guided differentiable clustering directly into representation learning.
  • A churn-robust multi-view representation that jointly models semantic intent, temporal context, host behavior, and frequency cues under mixed OT and IT telemetry.
  • An evidence stack that separates primary scientific evidence (public benchmarks), mechanistic evidence (ablation, failure-mode analysis), and operational evidence (deployment-scale case study).

Main Results {#main-results}

Results are reported as means over five random seeds on public benchmarks.

Dataset ARI NMI Coverage (%)
Alert-SOC-2024 0.618 0.634 98.10
HDFS 0.607 0.626 94.60
ISCX-IDS-2012 0.692 0.755 99.40

Note: Alert-SOC-2024 corresponds to the 55WD naming used in the released configs.

Public benchmark summary metrics

Ablation Ladder

Variant Prediction Accuracy (%) ARI NMI What It Shows
Prediction-only baseline 91.32 0.000 0.000 High pretext score alone does not recover usable incident grouping
+ ICA without joint optimization 93.15 0.439 0.482 Prototype assignments start to shape the geometry
+ Joint optimization without multi-view 93.98 0.512 0.572 Coupling improves semantic separability further
+ Multi-view without temperature schedule 94.41 0.546 0.610 Additional robustness under interleaving and churn
Full ACDC 94.85 0.618 0.634 Best temporal-semantic balance

Structural Failure Modes of Simpler Alternatives

  • Alert-SOC-2024 diagnostic slice: ISH-13 collapses into DBSCAN noise, discarding 82.95% of windows.
  • HDFS subset: ISH-13 reaches Silhouette 0.9737 while ARI stays at 0.0000 and fragments into 1,015 micro-clusters.

These are not cosmetic degradations — they are structural failures that make downstream triage harder.

Failure modes of heuristic two-stage alert correlation baselines

Interpretable Multi-View Signals

KernelSHAP analysis on Alert-SOC-2024 shows that semantic and frequency cues dominate high-risk tactic grouping, while temporal periodicity and host-behavior invariants help isolate benign polling and maintenance-like traffic.

Feature-importance attribution for Alert-SOC-2024

Robustness and Stability

Stability Measure Result
Bootstrap ARI 0.617 ± 0.009
Inter-seed ARI mean 0.618, range 0.612–0.627

Bootstrap and inter-seed stability analysis

Workload Reduction at Analyst Granularity {#workload-reduction}

The compression from raw events to analyst-facing incident clusters:

Dataset Raw Events (M) Incident Clusters Reduction Ratio
Alert-SOC-2024 0.56 84 6,619
HDFS 5.33 185 28,812
ISCX-IDS-2012 2.07 97 21,340
SGCC 14.95 231 64,719

Real-World Deployment Case Study

Metric ACDC BiCAM (best 2-stage baseline)
Latency (ms/window) 0.847 1.276
Throughput (windows/s) 1,247 798
Training memory (GB) 4.38 6.95
Inference memory (GB) 3.02 4.95

ACDC reduces per-window latency by ~34%, inference memory by 39%, and increases throughput by 56% relative to the strongest two-stage baseline.

Privacy-preserving deployment case card

Verification {#verification}

For a full reproducibility walkthrough, see CONTRIBUTING.md. One-command Track-A claim verification:

pip install -r requirements.txt
python tools/repro/verify_track_a_claims.py --gpu 0

This writes outputs/repro/track_a_claim_manifest.json with SHA-256 digests, parsed metrics, and the manuscript rows each output supports.

Fixed Controls

  • All released configs fix data.random_seed: 42.
  • main.py applies that seed at both train and evaluate entrypoints.
  • Public reruns are anchored to the same split/seed path and should closely match reported means.

Quick Start

Install dependencies:

pip install -r requirements.txt

Train:

python main.py --train --config configs/HDFS/hdfs_config.yaml --gpu 0

Evaluate:

python main.py --evaluate --config configs/HDFS/hdfs_config.yaml --gpu 0

Artifacts are written to outputs/.

Dataset Preparation

Datasets are not distributed in this repository. Place local copies under datasets/ following the layout in datasets/README.md.

Repository Structure

Path Description
core/ Model, RCA, and loss components for ACDC
training/ Trainer implementation
utils/ Data loading, logging, and runtime helpers
configs/ Experiment configs with relative dataset placeholders
tools/repro/ Reproducibility and claim-verification helpers
tests/ Config sanity checks
assets/ Paper-consistent figures

License

Licensed under GNU General Public License v3.0.

About

Unified Alert Correlation for Cyber-Physical Power Systems via Prototype-Guided Differentiable Clustering

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages