Abstract: The development of Large Language Models (LLMs) has significantly transformed the field of artificial intelligence, enabling machines to understand and generate human-like language. While most LLMs are trained on dominant global languages like English, there is a growing need to include regional languages such as Kannada to ensure linguistic inclusivity and cultural representation. This research focuses on the transformation and application of LLMs for the Kannada language. It explores data collection, preprocessing, tokenization, and fine-tuning strategies to adapt LLMs effectively. The study also addresses challenges such as limited datasets, script complexity, and semantic nuances unique to Kannada. By building or adapting Kannada LLMs, this work aims to enhance natural language processing (NLP) capabilities for Kannada speakers, supporting applications like translation, chatbots, sentiment analysis, and digital education. This transformation is a step towards democratizing AI access across linguistic boundaries.
|
DOI:
10.17148/IJARCCE.2025.14697