Diabetic Retinopathy Classification Using Vision Transformer: A Strategy for Small Dataset Challenges

Mr K. Appala Raju; M. Pravallika; M. Bhumika; M.V.K.S. Harika

doi:10.17148/IJARCCE.2026.15478

← Back to VOLUME 15, ISSUE 4, APRIL 2026

Diabetic Retinopathy Classification Using Vision Transformer: A Strategy for Small Dataset Challenges

Mr K. Appala Raju, M. Pravallika, M. Bhumika, M.V.K.S. Harika

Downloads: Download PDF|DOI: 10.17148/IJARCCE.2026.15478

👁 18 views📥 3 downloads

Abstract: Early Detection of diabetic retinopathy, a complication of Vision loss in advance stages of diabetes, is essential to avoid permanent vision impairment. However, the automatic detection of diabetic retinopathy through medical image processing requires a large number of training data to build a model with good performance. This poses a challenge when working with small datasets as these models need large datasets to perform well on unseen data. Conventional Neural networks(CNNs), often fall short in capturing long-range dependencies, global pathological features across high resolution retinal images, leading to suboptimal performance in early-stage diagnosis. To address these limitations, this study proposes a Vision Transformer (ViT) model, designed to elevate DR severity classification (ranging from NO DR to Severe DR) by leveraging the self-attention mechanisms of transformer architectures. Vision Transformer(ViT) is a Deep learning architecture that generally requires large datasets for effective training. However, in this work, a smaller dataset is used because large medical datasets are difficult to access due to privacy and datasharing restrictions. The proposed approach utilizes a hierarchical structure where retinal fundus images from public (Kaggle) dataset APTOS 2109 dataset and a private (FGADR Website) dataset FGADR are divided into non-overlapping patches, embedded, and enriched with positional information. The proposal model achieves accuracy rates on threeclassification of 90% on FGADR and 86% on APTOS 2019 dataset. The model exhibits high performance, achieving a quadratic Weighted kappa (QWK) score of 0.93 on FGADR and 0.86 on APTOS. The proposed model demonstrates the good results to perform multi-class classification of DR using limited number of images.

Keywords: Diabetic Retinopathy (DR), Blindness/ Vision Loss Detection, Disease Severity Classification, Convolutional Neural Networks (CNNs), Vision Transformer (ViT).

How to Cite:

[1] Mr K. Appala Raju, M. Pravallika, M. Bhumika, M.V.K.S. Harika, “Diabetic Retinopathy Classification Using Vision Transformer: A Strategy for Small Dataset Challenges,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2026.15478

This work is licensed under a Creative Commons Attribution 4.0 International License.