Relative Difficulty Distillation for

Semantic Segmentation

Dong Liang1,2,  Yue Sun 1,  Yun Du 1,   Songcan Chen 1,  Sheng-Jun Huang 1,  

1 MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
2 Nanjing University of Aeronautics and Astronautics Shenzhen Research Institute

Abstract


Current knowledge distillation (KD) methods primarily focus on transferring various structured knowledge and designing corresponding optimization goals to encourage the student network to imitate the output of the teacher network. However, introducing too many additional optimization objectives may lead to unstable training, such as gradient conflicts. Moreover, these methods ignored the guidelines of relative learning difficulty between the teacher and student networks. Inspired by human cognitive science, in this paper, we redefine knowledge from a new perspective -- the student and teacher networks' relative difficulty of samples, and propose a pixel-level KD paradigm for semantic segmentation named Relative Difficulty Distillation (RDD). We propose a two-stage RDD framework: Teacher-Full Evaluated RDD (TFE-RDD) and Teacher-Student Evaluated RDD (TSE-RDD). RDD allows the teacher network to provide effective guidance on learning focus without additional optimization goals, thus avoiding adjusting learning weights for multiple losses. Extensive experimental evaluations using a general distillation loss function on popular datasets such as Cityscapes, CamVid, Pascal VOC, and ADE20k demonstrate the effectiveness of RDD against state-of-the-art KD methods. Additionally, our research showcases that RDD can integrate with existing KD methods to improve their upper performance bound. Source Code: https://github.com/sunyueue/RDD.git.

    Method

    Overall architecture of our proposed RDD. It includes two stages: Teacher-Full Evaluated RDD (TFE-RDD) in the early learning stage and Teacher-Student Evaluated RDD (TSE-RDD) in the later learning stage.

Contributions

  1. We propose a new knowledge distillation paradigm for semantic segmentation named Relative Difficulty Distillation (RDD). RDD avoids additional optimization objectives and can seamlessly integrate with other feature/response-based KD methods to improve their upper performance bound.

  2. We devise two specific RDD methods, TFE-RDD and TSE-RDD, tailored for the early and later learning stages, respectively. The teacher network incorporates the student network to generate relative difficulty, guiding the student network to focus on the most valuable pixels during the different learning stages.

  3. RDD achieves the best distillation performance among the state-of-the-art methods on four popular semantic segmentation datasets with various semantic segmentation architectures.

Results

Visual quality comparison


Performance comparison



Ablation Studies

The effect of components in the proposed method

Integrate with other KD approaches

Materials


PDF| Code| Homepage

Citation

@article{liang2024relative,
                         title={Relative difficulty distillation for semantic segmentation},
                         author={Liang, Dong and Sun, Yue and Du, Yun and Chen, Songcan and Huang, Sheng-Jun},
                         journal={Science China Information Sciences},
                         volume={67},
                         number={9},
                         pages={192105},
                         year={2024},
                         publisher={Springer}}
                    

Contact

If you have any questions, please contact Dong Liang at liangdong@nuaa.edu.cn.