GitHub topics: research-based-approaches
raj-tyagi/4CLIP-Image-Captioning
This repository presents 4CLIP, a novel approach to image captioning that enhances traditional models by dividing images into four quadrants and processing them individually. By leveraging a pretrained ViT-GPT2 model from Hugging Face, 4CLIP generates more detailed and comprehensive captions, making it suitable for fine-grained visual tasks.
Language: Python - Size: 288 KB - Last synced at: 1 day ago - Pushed at: 7 months ago - Stars: 0 - Forks: 1
