Research
My research interests lie at the intersection of robust and
explainable artifical intelligence with the long-term goal of making AI systems safe and scalable.
Currently, I'm working on:
- Post-training of LLM, MLLM
- Trustworthy algorithms on LLM, MLLM
- Machine Unlearning, Safety Alignment
|
News
- 2025-03 Completed Safety Mirage focusing on the application of machine unlearning on VLM safety alignment, which is the first project during my PhD journey!
- 2024-08 Started PhD journey at Michigan State University!
- 2024-06 Graduated from Xi'an Jiaotong University!
|
Publications
(* denotes equal contribution)
|
|
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning
Yiwei Chen*,
Yuguang Yao*,
Yihua Zhang,
Bingquan Shen, Gaowen Liu,
Sijia Liu,
Under Review , 2025
arXiv
Conventional supervised safety fine-tuning of VLMs suffers from the “safety mirage”
problem due to training data bias, resulting in spurious correlations and over-rejections following one-
word attacks. Employing unlearning algorithms on VLMs effectively removes harmful content and
addresses these safety issues.
|
|
Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer
Zhihao Zhang*,
Yiwei Chen*,
Weizhan Zhang,
Caixia Yan, Qinghua Zheng, Qi Wang, Wangdu Chen
ACM MM , 2023
Code
/
arXiv
Propose a tile classification based viewport prediction method with Multi-modal Fusion Transformer
to improve the robustness of viewport prediction.
|
|
|
|