Generating Synthetic CPS Datasets Using a Fine-Tuned DoppelGANger Model

By Dr. Yaa Acquaah

The growing reliance on cyber-physical systems (CPS) in critical infrastructure makes anomaly detection a priority — but collecting real-world labeled attack data is nearly impossible. To address this gap, I fine-tuned the DoppelGANger model to generate high-fidelity synthetic datasets for CPS research.

Why Synthetic Data Matters

Publicly available CPS datasets are few, limited in domain, and often lack diversity in attack types. By generating synthetic data that closely mimics real operational and attack behavior, we can train and benchmark anomaly detection models more effectively — and ethically.

Fine-Tuning DoppelGANger for CPS Domains

I adapted the DoppelGANger architecture to CPS-specific time-series patterns, capturing both temporal dependencies and multivariate signal interactions. The fine-tuned model was used to generate synthetic versions of four well-known datasets:

Introducing a CPS Data Generation Workflow

I developed a full workflow for evaluating synthetic dataset quality in CPS, including:

This enables researchers to confidently generate and validate datasets aligned with real-world behavior, reducing overfitting to unrealistic simulations.

What's Next: LLMs for Synthetic Data

I'm currently exploring how Large Language Models (LLMs) can be applied to generate or perturb CPS telemetry data — especially for simulating rare or complex failure scenarios that GANs struggle with. Combining the statistical strength of GANs with the semantic richness of LLMs could unlock new dimensions in synthetic data modeling.

This work bridges deep learning, simulation, and cybersecurity.

References:
Lin, Z., Jain, A., Wang, C., Fanti, G., & Sekar, V. (2019). Generating high-fidelity, synthetic time series with doppelganger. arXiv preprint arXiv:1909.13403.
Acquaah, Y., & Roy, K. (2025). Realistic synthetic dataset generation for cyber-physical systems: a performance evaluation. Discover Applied Sciences, 7(7), 719.

Questions or ideas? Feel free to contact me or return to my portfolio.