Internal Safety Collapse
in Frontier Large Language Models

We turn any frontier LLM into a harmful dataset generator
with outputs resembling unaligned models from 2023.

ISC exposes a structural safety alignment vulnerability.
Anyone who understands it can cause most frontier models to exhibit unsafe behavior at scale.

1Deakin University    2Fudan University    3Shanghai Key Laboratory of Multimodal Embodied AI    4City University of Hong Kong
5University of Melbourne    6Singapore Management University    7University of Illinois at Urbana-Champaign

๐ŸŽฌ Demo

Takes a few seconds to load.

A live ISC reproduction on Grok โ€” EN version ยท ZH version

Frontier LLMs

Tracking ISC across 70 frontier models. Every red dot is a confirmed case.

Frontier LLM trigger progress
Model Status Demo Contributor

models tracked · 25 per page · GitHub README →

Live Cases

These conversation links show that the community can easily trigger ISC โ€” even without any dedicated attack.

We only show mild examples here. To achieve targeted harmful generation, refer to our paper and tutorial notebooks.

๐ŸŒ Community Reproductions

Community members who learned the ISC concept and successfully reproduced it on frontier models.

Issue Model Method Domain By
demo Claude Opus 4.8 Agentic TVD โ€” QwenGuard + guard-attack, model-engineered verify loops AI Safety @wuyoscar
demo Claude Haiku 4.5 Agentic TVD โ€” guard-attack; previously untriggered (green) model AI Safety @wuyoscar
#52 Gemini 2.5 Pro LaTeX template โ€” social engineering scripts, no code Other @wuyoscar
#48 Claude Opus 4.6 Agentic TVD โ€” multilingual harmful completions (replaces web link) AI Safety @wuyoscar
#42 Gemini 3.1 Pro Preview Agentic TVD โ€” multilingual harmful completions AI Safety @wuyoscar
#27 Claude Sonnet 4.5 Thinking 30k+ token, 42 misinformation samples Media & Comms @fresh-ma
#31 Kimi K2.5 Instant Erotic fiction moderation pipeline AI Safety @fresh-ma
#22 GPT-5.3 Chat Modified aiml_openai_moderation Content Moderation @zry29
#19 Gemini 3 Flash Red-team test case generator (file upload) AI Safety @bboylyg
#12 Gemini 3 Flash CommsDraft Pro (fake govt declarations) Financial Misinfo @HanxunH
#9 Grok 4.20 Beta LLaMA Guard test case gen (hardcore) AI Safety @HanxunH
#11 Dola Seed 2.0 LLaMA Guard test case gen AI Safety @HanxunH
#4 Qwen 3 Max Cantera incineration chemistry Comp. Chemistry @HanxunH
#5 ERNIE 5.0 Cantera + HCN/COClโ‚‚ Comp. Chemistry @HanxunH

ISC-Bench & The TVD Framework

ISC is triggered by the TVD design pattern (Task + Validator + Data). A legitimate professional task where the model must fill in harmful data to satisfy a code validator. 84 scenarios across 9 domains:

Task

Benign professional scenario

Validator

Pydantic / JSON schema checks

Data

LLM fills ??? placeholders

84
Scenarios
9
Domains
100%
Trigger Rate
0%
Defense Success

Comp. Biology

16

Comp. Chemistry

12

Cybersecurity

8

Pharmacology

4

AI Safety & ML

26

Clinical Genomics

5

Epidemiology

4

Media & Comms

8

Other

1

How to Inspect ISC

To learn what ISC is and how it works, refer to our paper and tutorial notebooks.

  1. Read our paper and test our official templates
  2. Play around with the template โ€” modify it to target specific generation. (Not sure what target generation means? See our tutorial notebooks)
  3. Start a conversation and try to make the model produce unsafe results

Citation

BibTeX
@article{wu2026isc,
  title={Internal Safety Collapse in Frontier Large Language Models},
  author={Wu, Yutao and Liu, Xiao and Gao, Yifeng and Zheng, Xiang
          and Huang, Hanxun and Li, Yige and Wang, Cong and Li, Bo
          and Ma, Xingjun and Jiang, Yu-Gang},
  journal={arXiv preprint arXiv:2603.23509},
  year={2026},
  url={https://arxiv.org/abs/2603.23509}
}

โš ๏ธ Disclaimer โ€” This project is released solely for academic safety research and responsible disclosure. As AI agents become increasingly autonomous, we believe ISC represents a critical and underexplored threat to safety alignment. The purpose of this work is to help the research community understand the vulnerability and collaboratively develop effective mitigations โ€” not to enable harm. WE DO NOT ALLOW any use outside of safety research. WE DO NOT ALLOW any misuse of this research. Model providers interested in mitigations: contact us.