Holistic Weighted Distillation for Semantic Segmentation

Abstract

Channel-wise distillation for semantic segmentation has proven to be a more effective method than spatial-based distillation. By removing the redundant information from the teacher model, the student can focus on specific channel-related pixels, which can be viewed as a weighting of the pixels. However, the standard channel-wise distillation ignores the fact that such importance difference also exists among channels. In this paper, we propose a novel method called Holistic Weighted Distillation (HWD) to address this issue. We calculate the channel divergences between the teacher and the student, and convert them into distillation weights, making the student focus more on learning channels that are not well mastered, thus improving the final model performance. Besides, our method does not introduce additional network structure or back-propagation process, which improves the training efficiency. Experiments on ADE20K, Cityscapes, and COCO-Stuff demonstrate the superiority of our method. The code is available at https://github.com/zju-SWJ/HWD.

Publication
In International Conference on Multimedia and Expo 2023
Wujie Sun
Wujie Sun
Student

I am a fifth-year Ph.D. student, and my supervisors are Prof. Chun Chen and Prof. Can Wang.

Defang Chen
Defang Chen
陈德仿 博士后
Can Wang
Can Wang
王灿 教授
Yan Feng
Yan Feng
冯雁 副教授
Chun Chen
Chun Chen
陈纯 院士