Multi-Modal AI Agent for Intelligent Email Categorization and Auto-Reply

RANJINI; J.LIN EBY CHANDRA

doi:10.17148/IJARCCE.2026.15687

← Back to VOLUME 15, ISSUE 6, JUNE 2026

Multi-Modal AI Agent for Intelligent Email Categorization and Auto-Reply

RANJINI, J.LIN EBY CHANDRA

Downloads: Download PDF|DOI: 10.17148/IJARCCE.2026.15687

👁 4 views📥 0 downloads

Abstract: The dramatic increase in daily email volumes across enterprise, healthcare, and e-governance sectors has created an urgent need for intelligent systems capable of autonomous email understanding, classification, and response generation. This paper proposes MMEA-Net (Multi-Modal Email Agent Network), a novel deep learning framework that integrates transformer-based language models, visual document encoders, and metadata-driven contextual reasoning to perform fine-grained email categorization and context-aware auto-reply generation. Unlike prior work relying solely on email body text, MMEA-Net processes three complementary modalities: textual content encoded via DeBERTa-v3-Large, visual layout of attached documents processed through LayoutLMv3, and structural metadata including sender reputation scores, thread depth, and temporal patterns encoded by a dedicated MLP module. The three modality streams are fused through a Gated Cross-Modal Attention (GCMA) mechanism that dynamically weights each modality's contribution based on input context. A reinforcement-learning-based Auto-Reply Generator (ARG) then produces professional, intent-aligned responses conditioned on the predicted category and a domain-specific policy knowledge base. Experiments on the Enron Email Dataset, TREC 2007, and a newly constructed Healthcare Email Corpus demonstrate that MMEA-Net achieves 95.3% overall accuracy, 94.1% macro-F1, BLEU-4 of 41.2, and human acceptability of 89.6%, outperforming all evaluated baselines by statistically significant margins.

Keywords: Multi-Modal Learning; Email Categorization; Auto-Reply Generation; Transformer; Gated Cross-Modal Attention; DeBERTa; Reinforcement Learning from Human Feedback

How to Cite:

[1] RANJINI, J.LIN EBY CHANDRA, “Multi-Modal AI Agent for Intelligent Email Categorization and Auto-Reply,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2026.15687

This work is licensed under a Creative Commons Attribution 4.0 International License.