|
Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs
Yiwei Chen*,
Soumyadeep Pal*,
Yimeng Zhang,
Qing Qu,
Sijia Liu
Under Review , 2025
Short version accepted for Oral at MUGen @ ICML'25
Code
/
arXiv
Large language models retain persistent fingerprints in their outputs and hidden activations even after unlearning.
These traces enable detection of whether a model has been “unlearned,” exposing a new vulnerability in reverse-engineering forgotten information.
|
|
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning
Yiwei Chen*,
Yuguang Yao*,
Yihua Zhang,
Bingquan Shen,
Gaowen Liu,
Sijia Liu
Under Review , 2025
Code
/
arXiv
Conventional supervised safety fine-tuning of VLMs suffers from the “safety mirage”
problem due to training data bias, resulting in spurious correlations and over-rejections following one-
word attacks. Employing unlearning algorithms on VLMs effectively removes harmful content and
addresses these safety issues.
|
|
Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer
Zhihao Zhang*,
Yiwei Chen*,
Weizhan Zhang,
Caixia Yan, Qinghua Zheng, Qi Wang, Wangdu Chen
ACM MM , 2023
Code
/
arXiv
Propose a tile classification based viewport prediction method with Multi-modal Fusion Transformer
to improve the robustness of viewport prediction.
|
|
|