GitHub topics: vilm

Repositories

SJ9VRF/Fine-tune-Vision-Language-Model

This repository contains the implementation of the Vision-and-Language Transformer (ViLT) model fine-tuned for Visual Question Answering (VQA) tasks. The project is structured to be easy to set up and use, providing a streamlined approach for experimenting with different configurations and datasets.

Language: Python - Size: 29.3 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0