ACDC

Unified Alert Correlation for Cyber-Physical Power Systems via Prototype-Guided Differentiable Clustering

This repository contains the reference implementation of ACDC, a graph-free framework that couples multi-view alert representation learning with prototype-guided differentiable clustering to correct temporal-semantic bias in cyber-physical SOC alert streams.

When using this code, please cite our paper (citation will be added upon publication).

3
public benchmarks

14.95M
SGCC alerts processed

231
incident clusters

0.847 ms
per-window latency

1,247
windows/s throughput

Problem Motivation

Two-stage encode-then-cluster pipelines overfit temporal adjacency under heavy interleaving, causing semantically distinct campaigns to collapse into mixed clusters. ACDC addresses this temporal-semantic bias by making clustering part of the learning objective instead of a post-hoc step on frozen embeddings.

Method Overview

ACDC aligns latent geometry with incident semantics rather than letting clustering inherit a prediction-shaped embedding space. The pipeline combines four complementary alert views, an unrolled Iterative Cross-Attention mechanism for differentiable prototype assignment, and jointly optimized heads for clustering-aware representation learning.

Key Contributions

A unified alert-correlation framework that embeds prototype-guided differentiable clustering directly into representation learning.
A churn-robust multi-view representation that jointly models semantic intent, temporal context, host behavior, and frequency cues under mixed OT and IT telemetry.
An evidence stack that separates primary scientific evidence (public benchmarks), mechanistic evidence (ablation, failure-mode analysis), and operational evidence (deployment-scale case study).

Main Results {#main-results}

Results are reported as means over five random seeds on public benchmarks.

Dataset	ARI	NMI	Coverage (%)
Alert-SOC-2024	0.618	0.634	98.10
HDFS	0.607	0.626	94.60
ISCX-IDS-2012	0.692	0.755	99.40

Note: Alert-SOC-2024 corresponds to the 55WD naming used in the released configs.

Ablation Ladder

Variant	Prediction Accuracy (%)	ARI	NMI	What It Shows
Prediction-only baseline	91.32	0.000	0.000	High pretext score alone does not recover usable incident grouping
+ ICA without joint optimization	93.15	0.439	0.482	Prototype assignments start to shape the geometry
+ Joint optimization without multi-view	93.98	0.512	0.572	Coupling improves semantic separability further
+ Multi-view without temperature schedule	94.41	0.546	0.610	Additional robustness under interleaving and churn
Full ACDC	94.85	0.618	0.634	Best temporal-semantic balance

Structural Failure Modes of Simpler Alternatives

Alert-SOC-2024 diagnostic slice: ISH-13 collapses into DBSCAN noise, discarding 82.95% of windows.
HDFS subset: ISH-13 reaches Silhouette 0.9737 while ARI stays at 0.0000 and fragments into 1,015 micro-clusters.

These are not cosmetic degradations — they are structural failures that make downstream triage harder.

Interpretable Multi-View Signals

KernelSHAP analysis on Alert-SOC-2024 shows that semantic and frequency cues dominate high-risk tactic grouping, while temporal periodicity and host-behavior invariants help isolate benign polling and maintenance-like traffic.

Robustness and Stability

Stability Measure	Result
Bootstrap ARI	0.617 ± 0.009
Inter-seed ARI	mean 0.618, range 0.612–0.627

Workload Reduction at Analyst Granularity {#workload-reduction}

The compression from raw events to analyst-facing incident clusters:

Dataset	Raw Events (M)	Incident Clusters	Reduction Ratio
Alert-SOC-2024	0.56	84	6,619
HDFS	5.33	185	28,812
ISCX-IDS-2012	2.07	97	21,340
SGCC	14.95	231	64,719

Real-World Deployment Case Study

Metric	ACDC	BiCAM (best 2-stage baseline)
Latency (ms/window)	0.847	1.276
Throughput (windows/s)	1,247	798
Training memory (GB)	4.38	6.95
Inference memory (GB)	3.02	4.95

ACDC reduces per-window latency by ~34%, inference memory by 39%, and increases throughput by 56% relative to the strongest two-stage baseline.

Verification {#verification}

For a full reproducibility walkthrough, see CONTRIBUTING.md. One-command Track-A claim verification:

pip install -r requirements.txt
python tools/repro/verify_track_a_claims.py --gpu 0

This writes outputs/repro/track_a_claim_manifest.json with SHA-256 digests, parsed metrics, and the manuscript rows each output supports.

Fixed Controls

All released configs fix data.random_seed: 42.
main.py applies that seed at both train and evaluate entrypoints.
Public reruns are anchored to the same split/seed path and should closely match reported means.

Quick Start

Install dependencies:

pip install -r requirements.txt

Train:

python main.py --train --config configs/HDFS/hdfs_config.yaml --gpu 0

Evaluate:

python main.py --evaluate --config configs/HDFS/hdfs_config.yaml --gpu 0

Artifacts are written to outputs/.

Dataset Preparation

Datasets are not distributed in this repository. Place local copies under datasets/ following the layout in datasets/README.md.

Repository Structure

Path	Description
`core/`	Model, RCA, and loss components for ACDC
`training/`	Trainer implementation
`utils/`	Data loading, logging, and runtime helpers
`configs/`	Experiment configs with relative dataset placeholders
`tools/repro/`	Reproducibility and claim-verification helpers
`tests/`	Config sanity checks
`assets/`	Paper-consistent figures

License

Licensed under GNU General Public License v3.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ACDC

Problem Motivation

Method Overview

Key Contributions

Main Results {#main-results}

Ablation Ladder

Structural Failure Modes of Simpler Alternatives

Interpretable Multi-View Signals

Robustness and Stability

Workload Reduction at Analyst Granularity {#workload-reduction}

Real-World Deployment Case Study

Verification {#verification}

Fixed Controls

Quick Start

Dataset Preparation

Repository Structure

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
configs		configs
core		core
datasets		datasets
tests		tests
tools/repro		tools/repro
training		training
utils		utils
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ACDC

Problem Motivation

Method Overview

Key Contributions

Main Results {#main-results}

Ablation Ladder

Structural Failure Modes of Simpler Alternatives

Interpretable Multi-View Signals

Robustness and Stability

Workload Reduction at Analyst Granularity {#workload-reduction}

Real-World Deployment Case Study

Verification {#verification}

Fixed Controls

Quick Start

Dataset Preparation

Repository Structure

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages