Abstract: This paper focuses on unraveling the inner work- ings of the transformer architecture, a cornerstone of modern enabling parallel processing and long-range dependency cap- ture. From this seminal work, w√e adopt the core attention large language models (LLMs). While transformers have driven mechanism formula (Q × KT )/ dk and the multi-head at- breakthroughs in natural language processing through self- attention mechanisms, their internal operations remain complex and opaque. Using GPT-2 as an illustrative case study, we develop an interactive visualization framework to map information flow, display attention patterns, and illustrate token embeddings and layer interactions. These visualizations aim to deepen compre- hension of transformer mechanics, enhance model transparency, and guide future advancements in AI design.
Keywords: Transformer Architecture (TA): Neural network ar- chitecture based on self-attention mechanisms; Large Language Models (LLMs): Advanced AI models trained on vast text datasets; Natural Language Processing (NLP): AI technology for understanding and processing human language; Self-Attention Mechanism (SAM): Method allowing models to weigh importance of different input elements.
|
DOI:
10.17148/IJARCCE.2025.14442