Internal Safety Collapse
in Frontier Large Language Models

We turn any frontier LLM into a harmful dataset generator
with outputs resembling unaligned models from 2023.

ISC exposes a structural safety alignment vulnerability.
Anyone who understands it can cause most frontier models to exhibit unsafe behavior at scale.

1Deakin University    2Fudan University    3Shanghai Key Laboratory of Multimodal Embodied AI    4City University of Hong Kong
5University of Melbourne    6Singapore Management University    7University of Illinois at Urbana-Champaign

🎬 Demo

Takes a few seconds to load.

JailbreakArena

Real-time tracking of ISC across 330 Arena-ranked models. Every red dot is a confirmed case.

Progress
Rank Model Arena Score Status Demo Contributor

330 models tracked · 25 per page · GitHub README →

Live Cases

These conversation links show that the community can easily trigger ISC — even without any dedicated attack.

We only show mild examples here. To achieve targeted harmful generation, refer to our paper and tutorial notebooks.

🌍 Community Reproductions

Community members who learned the ISC concept and successfully reproduced it on frontier models.

Issue Model Method Domain By
#22 GPT-5.3 Chat Modified aiml_openai_moderation Content Moderation @zry29
#19 Gemini 3 Flash Red-team test case generator (file upload) AI Safety @bboylyg
#12 Gemini 3 Flash CommsDraft Pro (fake govt declarations) Financial Misinfo @HanxunH
#9 Grok 4.20 Beta LLaMA Guard test case gen (hardcore) AI Safety @HanxunH
#11 Dola Seed 2.0 LLaMA Guard test case gen AI Safety @HanxunH
#4 Qwen 3 Max Cantera incineration chemistry Comp. Chemistry @HanxunH
#5 ERNIE 5.0 Cantera + HCN/COCl₂ Comp. Chemistry @HanxunH

ISC-Bench & The TVD Framework

ISC is triggered by the TVD design pattern (Task + Validator + Data). A legitimate professional task where the model must fill in harmful data to satisfy a code validator. 56 templates across 8 domains:

Task

Benign professional scenario

Validator

Pydantic / JSON schema checks

Data

LLM fills ??? placeholders

56
Templates
8
Domains
100%
Trigger Rate
0%
Defense Success

Comp. Biology

16

Comp. Chemistry

10

Cybersecurity

7

Pharmacology

7

AI Safety & ML

5

Clinical Genomics

3

Epidemiology

2

Media & Comms

3

How to Trigger ISC & Submit a Case

To learn what ISC is and how it works, refer to our paper and tutorial notebooks.

  1. Read our paper and test our official templates
  2. Play around with the template — modify it to target specific generation. (Not sure what target generation means? See our tutorial notebooks)
  3. Start a conversation and try to make the model produce unsafe results
  4. Submit your evidenceOpen a GitHub Issue → or email wuy7117@gmail.com

Citation

BibTeX
@article{wu2026isc,
  title={Internal Safety Collapse in Frontier Large Language Models},
  author={Wu, Yutao and Liu, Xiao and Gao, Yifeng and Zheng, Xiang
          and Huang, Hanxun and Li, Yige and Wang, Cong and Li, Bo
          and Ma, Xingjun and Jiang, Yu-Gang},
  journal={arXiv preprint arXiv:2603.23509},
  year={2026},
  url={https://arxiv.org/abs/2603.23509}
}

⚠️ Disclaimer — This project is released solely for academic safety research and responsible disclosure. As AI agents become increasingly autonomous, we believe ISC represents a critical and underexplored threat to safety alignment. The purpose of this work is to help the research community understand the vulnerability and collaboratively develop effective mitigations — not to enable harm. WE DO NOT ALLOW any use outside of safety research. WE DO NOT ALLOW any misuse of this research. Model providers interested in mitigations: contact us.