Yiwei Chen*, Soumyadeep Pal*, Yimeng Zhang, Qing Qu, Sijia Liu Under Review , 2025 Short version accepted for Oral at MUGen @ ICML'25 Code / arXiv Large language models retain persistent fingerprints in their outputs and hidden activations even after unlearning. These traces enable detection of whether a model has been “unlearned,” exposing a new vulnerability in reverse-engineering forgotten information.
Yiwei Chen*, Yuguang Yao*, Yihua Zhang, Bingquan Shen, Gaowen Liu, Sijia Liu Under Review , 2025 Code / arXiv Conventional safety fine-tuning of MLLMs suffers from a ``safety mirage'' caused by training bias, leading to spurious correlations and over-rejections under one-word attacks. Employing unlearning algorithms effectively remove harmful content and mitigate these issues.
Bingqi Shang*, Yiwei Chen*, Yihua Zhang, Bingquan Shen, Sijia Liu Under Review , 2025 Code / arXiv Demonstrated that LLM unlearning can be compromised through attention-sink-guided backdoor unlearning, where triggers placed at attention sinks enable models to recover forgotten knowledge while maintaining normal behavior without triggers.
Zhihao Zhang*, Yiwei Chen*, Weizhan Zhang, Caixia Yan, Qinghua Zheng, Qi Wang, Wangdu Chen ACM MM , 2023 Code / arXiv Propose a tile classification based viewport prediction method with Multi-modal Fusion Transformer to improve the robustness of viewport prediction. |