Cross-Layer Distillation with Semantic Calibration

Defang Chen, JianpingMei, YuanZhang, Can Wang, Zhe Wang, Yan Feng, Chun Chen

May 2021

Abstract

Knowledge distillation is a technique to enhance the generalization ability of a student model by exploiting outputs from a teacher model. Recently, feature-map based variants explore knowledge transfer between manually assigned teacher-student pairs in intermediate layers for further improvement. However, layer semantics may vary in different neural networks and semantic mismatch in manual layer associations will lead to performance degeneration due to negative regularization. To address this issue, we propose Semantic Calibration for cross-layer Knowledge Distillation (SemCKD), which automatically assigns proper target layers of the teacher model for each student layer with an attention mechanism. With a learned attention distribution, each student layer distills knowledge contained in multiple teacher layers rather than a specific intermediate layer for appropriate cross-layer supervision. We further provide theoretical analysis of the association weights and conduct extensive experiments to demonstrate the effectiveness of our approach. Code is avaliable at https://github.com/DefangChen/SemCKD.

Type

Knowledge-Distillation

Publication

In Proceedings of the AAAI Conference on Artificial Intelligence

Knowledge Distillation

Cross-Layer Distillation with Semantic Calibration

Abstract

Defang Chen

陈德仿博士后

Can Wang

王灿教授

Zhe Wang

Student

Yan Feng

冯雁副教授

Chun Chen

陈纯院士

Cross-Layer Distillation with Semantic Calibration

Abstract

Defang Chen

陈德仿 博士后

Can Wang

王灿 教授

Zhe Wang

Student

Yan Feng

冯雁 副教授

Chun Chen

陈纯 院士

陈德仿博士后

王灿教授

冯雁副教授

陈纯院士