Yiwei Chen (陈一苇)'s Homepage

Yiwei Chen (陈一苇)

I am a first-year CS Ph.D. student focusing on Trustworthy ML and Scalable AI at Michigan State University (MSU), where my advisor is Prof. Sijia Liu. Before that, I earned both my Master's degree and Bachelor's degree from Xi’an Jiaotong University (XJTU). Also I was admitted to the Young Gifted Program at XJTU.

Email / Google Scholar / Github / X / LinkedIn /

Research

My research interests lie at the intersection of robust and explainable artifical intelligence with the long-term goal of making AI systems safe and scalable. Currently, I'm working on:

Post-training of LLM, MLLM
Trustworthy algorithms on LLM, MLLM
Machine Unlearning, Safety Alignment

News

2025-05 Unlearn Detection focusing on the detecting unlearning traces in LLM from model outputs!
2025-03 Safety Mirage focusing on the application of machine unlearning on VLM safety alignment, which is the first project during my PhD journey!
2024-08 Started PhD journey at Michigan State University!
2024-06 Graduated from Xi'an Jiaotong University!

Publications (* denotes equal contribution)

	Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs Yiwei Chen, Soumyadeep Pal, Yimeng Zhang, Qing Qu, Sijia Liu Under Review , 2025 Short version accepted for Oral at MUGen @ ICML'25 Code / arXiv Large language models retain persistent fingerprints in their outputs and hidden activations even after unlearning. These traces enable detection of whether a model has been “unlearned,” exposing a new vulnerability in reverse-engineering forgotten information.
	Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning Yiwei Chen, Yuguang Yao, Yihua Zhang, Bingquan Shen, Gaowen Liu, Sijia Liu Under Review , 2025 Code / arXiv Conventional supervised safety fine-tuning of VLMs suffers from the “safety mirage” problem due to training data bias, resulting in spurious correlations and over-rejections following one- word attacks. Employing unlearning algorithms on VLMs effectively removes harmful content and addresses these safety issues.
	Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer Zhihao Zhang, Yiwei Chen*, Weizhan Zhang, Caixia Yan, Qinghua Zheng, Qi Wang, Wangdu Chen ACM MM* , 2023 Code / arXiv Propose a tile classification based viewport prediction method with Multi-modal Fusion Transformer to improve the robustness of viewport prediction.