Yiwei Chen
(陈一苇)

I am a second-year CS Ph.D. student focusing on Trustworthy ML and Scalable AI at Michigan State University (MSU), where my advisor is Prof. Sijia Liu. Before that, I earned both my Master's degree and Bachelor's degree from Xi’an Jiaotong University (XJTU). Also, I was admitted to the Young Gifted Program at XJTU.

profile photo
Research

My research interests lie at the intersection of robust and explainable artifical intelligence with the long-term goal of making AI systems safe and scalable. Currently, I'm working on:

  • Large Language Models, Multi-Modal Language Models, Multi-Modality
  • Reasoning, Post-Training, Agentic Systems
  • Alignment, Trustworthy Algorithms, Interpretability, Machine Unlearning
News
  • 2026-05 Joining Amazon as an Applied Scientist Intern this summer!
  • 2026-01 Two first author papers accepted by ICLR'26!
  • 2025-05 Joined Cisco as a PhD Intern!
  • 2025-05 Unlearning Isn't Invisible focusing on the detecting unlearning traces in LLM from model outputs!
  • 2025-03 Safety Mirage focusing on the application of machine unlearning on VLM safety alignment, which is the first project during my PhD journey!
  • 2024-08 Started PhD journey at Michigan State University!
  • 2024-06 Graduated from Xi'an Jiaotong University!
Internships
Amazon Amazon — Applied Scientist Intern May 2026 –
Cisco Cisco — PhD Intern May 2025 – Oct. 2025
Selected Publications

(* denotes equal contribution)

VLM Unlearn
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
Yiwei Chen*, Yuguang Yao*, Yihua Zhang, Bingquan Shen, Gaowen Liu, Sijia Liu
ICLR, 2026
Code / Paper

Conventional safety fine-tuning of MLLMs suffers from a ``safety mirage'' caused by training bias, leading to spurious correlations and over-rejections under one-word attacks. Employing unlearning algorithms effectively remove harmful content and mitigate these issues.

Unlearn Trace
Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs
Yiwei Chen*, Soumyadeep Pal*, Yimeng Zhang, Qing Qu, Sijia Liu
ICLR, 2026. Short version accepted for Oral at MUGen @ ICML'25
Code / Paper

Large language models retain persistent fingerprints in their outputs and hidden activations even after unlearning. These traces enable detection of whether a model has been “unlearned,” exposing a new vulnerability in reverse-engineering forgotten information.

LLM Lineage
Who Built This Model? Tracing LLM Lineage via Spectral Fingerprints in Weight Space
Yiwei Chen*, Bingqi Shang*, Sijia Liu
In Submission, 2026
Paper

A geometric fingerprinting framework to trace LLM lineage in weight space, combining spectral energy for coarse-grained discrimination and subspace alignment for fine-grained analysis, enabling data-free model provenance identification.

Cybersecurity Exploit Benchmark
Data-Centric Benchmarking of Exploit Generation in LLMs: Understanding the Impact of Fine-Tuning
Yiwei Chen*, Lichi Li*, Kai Cheung, Vinny Parla, Ganesh Sundaram
Manuscript, 2026
Paper

The first data-centric benchmark for CVE-conditioned exploit generation with a 6-level context hierarchy and 8-criterion evaluation framework, benchmarking 17 LLMs. Fine-tuned Qwen3-8B with reasoning-aware fine-tuning achieves +42.5% improvement, competitive with frontier LLMs.

Backdoor Unlearn
Forgetting to Forget: Attention Sink as A Gateway for Backdooring LLM Unlearning
Bingqi Shang*, Yiwei Chen*, Yihua Zhang, Bingquan Shen, Sijia Liu
Under Review, 2025
Code / Paper

Demonstrated that LLM unlearning can be compromised through attention-sink-guided backdoor unlearning, where triggers placed at attention sinks enable models to recover forgotten knowledge while maintaining normal behavior without triggers.

MFTR
Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer
Zhihao Zhang*, Yiwei Chen*, Weizhan Zhang, Caixia Yan, Qinghua Zheng, Qi Wang, Wangdu Chen
ACM MM, 2023
Code / Paper

Propose a tile classification based viewport prediction method with Multi-modal Fusion Transformer to improve the robustness of viewport prediction.