Interactive Visual Foundation Models: Talking and Generating

Siddharth Singh Chouhan; Sujal Jadhav; Vanshita Singh; Pratik Gaikwad

doi:10.17148/IJARCCE.2023.124116

← Back to VOLUME 12, ISSUE 4, APRIL 2023

Interactive Visual Foundation Models: Talking and Generating

Siddharth Singh Chouhan, Sujal Jadhav, Vanshita Singh, Pratik Gaikwad

Downloads: Download PDF|DOI: 10.17148/IJARCCE.2023.124116

👁 34 views📥 4 downloads

Abstract: The generation of images based on the content of a conversation using a visual foundation model. The aim is to develop a system that can generate images that align with the context of a conversation in a more intuitive and creative way. We propose a method that utilizes a pre-trained visual foundation model to extract features from the input text and generate an image that reflects the meaning of the conversation. The model is trained on a large-scale image dataset and a text dataset that is relevant to the target domain. Experimental results show that the proposed method outperforms existing methods in terms of image quality and content alignment with the conversation. The system has potential applications in various areas such as e-commerce, social media, and entertainment, where generating images from text can improve user engagement and experience.

Keywords: Visual Foundation model, AI, Large Language Models (LLM).

How to Cite:

[1] Siddharth Singh Chouhan, Sujal Jadhav, Vanshita Singh, Pratik Gaikwad, “Interactive Visual Foundation Models: Talking and Generating,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2023.124116

This work is licensed under a Creative Commons Attribution 4.0 International License.