Shifting Focus: From Global Semantics to Local Prominent Features in Swin Transformers for Knee Osteoarthritis Severity Assessment
Published in 32nd European Signal Processing Conference (EUSIPCO 2024), 2024
Conventional imaging diagnostics frequently encounter bottlenecks due to manual inspection, which can lead to delays and inconsistencies. Although deep learning offers a pathway to automation and enhanced accuracy, foundational models in computer vision often emphasise global context at the expense of the local structures that clinicians rely on.
We harness the Swin Transformer’s capacity to capture long-range spatial dependencies while refining the representation of clinically salient regions. The proposed localisation-aware refinement module aligns hierarchical transformer tokens with the final classifier distribution, ensuring that local cues remain discriminative across all scales. Extensive validation on two public benchmarks demonstrates improved robustness and precision for Kellgren–Lawrence severity prediction, underlining the method’s potential for clinical decision support.
