← Back to VOLUME 15, ISSUE 5, MAY 2026
This work is licensed under a Creative Commons Attribution 4.0 International License.
Generative AI: A Comprehensive Survey on Transformer-Based Models
👁 9 views📥 1 download
Abstract: The landscape of artificial intelligence has undergone a profound transformation with the emergence of generative models capable of producing coherent text, realistic images, synthesized audio, and functional code. Central to this shift is the Transformer architecture, introduced by Vaswani et al. in their landmark 2017 contribution “Attention Is All You Need,” which fundamentally redefined how sequential data is modeled by replacing recurrence with parallelizable, attention-based processing [1]. This survey provides a structured and thorough examination of Transformer-based generative AI systems, tracing their development from early recurrent sequence architectures through to the most advanced multimodal and agentic AI frameworks. A four-tier taxonomic model is introduced to systematically classify these systems: basic recurrent sequence models, Transformer-based NLP architectures, large-scale language models, and multimodal autonomous AI agents. Comparative analysis is conducted across these tiers, evaluating performance, scalability, and contextual modeling capability. Through review of more than a dozen pivotal studies— spanning bidirectional pretraining via BERT [2], autoregressive generation through the GPT series [3][4][5], and visual representation learning via Vision Transformers [6]—this paper maps out the major inflection points in the field’s evolution. Critical open challenges are also examined, including the substantial computational overhead of large-scale training, the persistent problem of model hallucination, opacity in decision-making, systemic data biases, and the inability to adapt to real-time information. The paper concludes with a forward-looking perspective on next-generation generative AI, emphasizing efficiency, trustworthiness, and ethical design.
Keywords: Generative AI, Transformer Architecture, Self-Attention, BERT, GPT, Large Language Models, Vision Transformer, Multimodal AI, Natural Language Processing, Deep Learning.
Keywords: Generative AI, Transformer Architecture, Self-Attention, BERT, GPT, Large Language Models, Vision Transformer, Multimodal AI, Natural Language Processing, Deep Learning.
How to Cite:
[1] J Hemanth, Harsha B, Ashish Kumar, Anurag N, Dr. Muhibur Rahman T.R, “Generative AI: A Comprehensive Survey on Transformer-Based Models,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2026.15541
