Similarity-Based Positional Encoding for Enhanced Classification in Medical Images

EasyChair Preprint 15458

7 pages•Date: November 24, 2024

Giorgio Leonardi, Luigi Portinale and Andrea Santomauro

Abstract

This paper introduces a novel similarity-based positional encoding method aimed at improving the classification of medical images using Vision Transformers (ViTs). Traditional positional encoding methods focus primarily on spatial information, but they may not adequately capture the complex geometric patterns characteristic of medical images. To address this, we propose a method that utilizes convolution operations to extract geometric features, followed by a similarity matrix based on cosine similarity between image patches. This encoding is then incorporated into the ViT model, enabling it to learn more meaningful relationships beyond basic spatial positioning. The effectiveness of this method is demonstrated through experiments on six medical imaging datasets from MedMNIST, where our approach consistently outperforms the conventional learned positional encoding. This is particularly true in datasets with prominent geometric structures like PneumoniaMNIST and BloodMNIST. The results indicate that similarity-based encoding can significantly enhance medical image classification accuracy.

Keyphrases: Vision Transformers, medical image classification, positional encoding

Links:

https://easychair.org/publications/preprint/vrtF

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:15458,
  author    = {Giorgio Leonardi and Luigi Portinale and Andrea Santomauro},
  title     = {Similarity-Based Positional Encoding for Enhanced Classification in Medical Images},
  howpublished = {EasyChair Preprint 15458},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser