Cross-Layer Distillation with Semantic Calibration

Abstract

Knowledge distillation is a technique to enhance the generalization ability of a student model by exploiting outputs from a teacher model. Recently, feature-map based variants explore knowledge transfer between manually assigned teacher-student pairs in intermediate layers for further improvement. However, layer semantics may vary in different neural networks and semantic mismatch in manual layer associations will lead to performance degeneration due to negative regularization. To address this issue, we propose Semantic Calibration for cross-layer Knowledge Distillation (SemCKD), which automatically assigns proper target layers of the teacher model for each student layer with an attention mechanism. With a learned attention distribution, each student layer distills knowledge contained in multiple teacher layers rather than a specific intermediate layer for appropriate cross-layer supervision. We further provide theoretical analysis of the association weights and conduct extensive experiments to demonstrate the effectiveness of our approach. Code is avaliable at https://github.com/DefangChen/SemCKD.

Publication
In Proceedings of the AAAI Conference on Artificial Intelligence
Defang Chen
Defang Chen
陈德仿 博士后
Can Wang
Can Wang
王灿 教授
Zhe Wang
Zhe Wang
Student

I am a fourth-year Ph.D. student, and my supervisors are Prof. Chun Chen and Prof. Can Wang.

Yan Feng
Yan Feng
冯雁 副教授
Chun Chen
Chun Chen
陈纯 院士