Semantic segmentation and uncertainty quantification with vision transformers for industrial applications
Kira Wursthorn,
Lili Gao,
Steven Landgraf,
Markus Ulrich
Kapitel/Beitrag aus dem Buch: Längle T. & Heizmann M. 2024. Forum Bildverarbeitung 2024.
Vision Transformers (ViTs) have recently achieved state-of-the-art performance in semantic segmentation tasks. However, their deployment in critical applications necessitates reliable uncertainty quantification to assess model confidence. To tackle this challenge, we combine a state-of-the-art ViT with the popular uncertainty quantification method Monte Carlo Dropout (MCD) to predict both segmentation and uncertainty maps. We focus on an industrial machine vision setting and carry out the experiments on the T-LESS dataset. The evaluation is carried out with regard to both the segmentation accuracy and the predicted uncertainties using appropriate metrics.