Abstract: The generation of images based on the content of a conversation using a visual foundation model. The aim is to develop a system that can generate images that align with the context of a conversation in a more intuitive and creative way. We propose a method that utilizes a pre-trained visual foundation model to extract features from the input text and generate an image that reflects the meaning of the conversation. The model is trained on a large-scale image dataset and a text dataset that is relevant to the target domain. Experimental results show that the proposed method outperforms existing methods in terms of image quality and content alignment with the conversation. The system has potential applications in various areas such as e-commerce, social media, and entertainment, where generating images from text can improve user engagement and experience.
Keywords: Visual Foundation model, AI, Large Language Models (LLM).
| DOI: 10.17148/IJARCCE.2023.124116