Internal Safety Collapse
in Frontier Large Language Models

We turn any frontier LLM into a harmful dataset generator
with outputs resembling unaligned models from 2023.

ISC exposes a structural safety alignment vulnerability.
Anyone who understands it can cause most frontier models to exhibit unsafe behavior at scale.

Yutao Wu¹ Xiao Liu¹ Yifeng Gao^2,3 Xiang Zheng⁴ Hanxun Huang⁵ Yige Li⁶ Cong Wang⁴ Bo Li⁷ Xingjun Ma^2,3 Yu-Gang Jiang^2,3

¹Deakin University ²Fudan University ³Shanghai Key Laboratory of Multimodal Embodied AI ⁴City University of Hong Kong
⁵University of Melbourne ⁶Singapore Management University ⁷University of Illinois at Urbana-Champaign

Paper Code JailbreakArena

🎬 Demo

Takes a few seconds to load.

JailbreakArena

Real-time tracking of ISC across 330 Arena-ranked models. Every red dot is a confirmed case.

Submit Your ISC Case

Rank	Model	Arena Score	Status	Demo	Contributor

330 models tracked · 25 per page · GitHub README →

Live Cases

These conversation links show that the community can easily trigger ISC — even without any dedicated attack.

We only show mild examples here. To achieve targeted harmful generation, refer to our paper and tutorial notebooks.

🌍 Community Reproductions

Community members who learned the ISC concept and successfully reproduced it on frontier models.

Issue	Model	Method	Domain	By
#22	GPT-5.3 Chat	Modified aiml_openai_moderation	Content Moderation	@zry29
#19	Gemini 3 Flash	Red-team test case generator (file upload)	AI Safety	@bboylyg
#12	Gemini 3 Flash	CommsDraft Pro (fake govt declarations)	Financial Misinfo	@HanxunH
#9	Grok 4.20 Beta	LLaMA Guard test case gen (hardcore)	AI Safety	@HanxunH
#11	Dola Seed 2.0	LLaMA Guard test case gen	AI Safety	@HanxunH
#4	Qwen 3 Max	Cantera incineration chemistry	Comp. Chemistry	@HanxunH
#5	ERNIE 5.0	Cantera + HCN/COCl₂	Comp. Chemistry	@HanxunH

ISC-Bench & The TVD Framework

ISC is triggered by the TVD design pattern (Task + Validator + Data). A legitimate professional task where the model must fill in harmful data to satisfy a code validator. 56 templates across 8 domains:

Task

Benign professional scenario

Validator

Pydantic / JSON schema checks

Data

LLM fills ??? placeholders

Templates

Domains

100%

Trigger Rate

Defense Success

Comp. Biology

Comp. Chemistry

Cybersecurity

Pharmacology

AI Safety & ML

Clinical Genomics

Epidemiology

Media & Comms

Paper 56 Templates Tutorials ISC-Agent

How to Trigger ISC & Submit a Case

To learn what ISC is and how it works, refer to our paper and tutorial notebooks.

Read our paper and test our official templates
Play around with the template — modify it to target specific generation. (Not sure what target generation means? See our tutorial notebooks)
Start a conversation and try to make the model produce unsafe results
Submit your evidence — Open a GitHub Issue → or email wuy7117@gmail.com

Citation

BibTeX

@article{wu2026isc,
  title={Internal Safety Collapse in Frontier Large Language Models},
  author={Wu, Yutao and Liu, Xiao and Gao, Yifeng and Zheng, Xiang
          and Huang, Hanxun and Li, Yige and Wang, Cong and Li, Bo
          and Ma, Xingjun and Jiang, Yu-Gang},
  journal={arXiv preprint arXiv:2603.23509},
  year={2026},
  url={https://arxiv.org/abs/2603.23509}
}

⚠️ Disclaimer — This project is released solely for academic safety research and responsible disclosure. As AI agents become increasingly autonomous, we believe ISC represents a critical and underexplored threat to safety alignment. The purpose of this work is to help the research community understand the vulnerability and collaboratively develop effective mitigations — not to enable harm. WE DO NOT ALLOW any use outside of safety research. WE DO NOT ALLOW any misuse of this research. Model providers interested in mitigations: contact us.

Internal Safety Collapse in Frontier Large Language Models

We turn any frontier LLM into a harmful dataset generator with outputs resembling unaligned models from 2023.