visual-language-learning | Topic

Topic: "visual-language-learning"

haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language: Python - Size: 13.4 MB - Last synced at: 6 days ago - Pushed at: 12 months ago - Stars: 23,143 - Forks: 2,556

NExT-GPT/NExT-GPT

Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

Language: Python - Size: 125 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 3,499 - Forks: 352

EvolvingLMMs-Lab/Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Language: Python - Size: 7.39 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 3,263 - Forks: 210

InternLM/InternLM-XComposer

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Language: Python - Size: 200 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2,834 - Forks: 172

RLHF-V/RLHF-V

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Language: Python - Size: 70.6 MB - Last synced at: 6 days ago - Pushed at: 11 months ago - Stars: 286 - Forks: 8

mlpc-ucsd/BLIVA

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions

Language: Python - Size: 12.3 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 261 - Forks: 24

thomas-yanxin/KarmaVLM

🧘🏻‍♂️KarmaVLM (相生)：A family of high efficiency and powerful visual language model.

Language: Python - Size: 2.68 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 88 - Forks: 3

AdrianBZG/llama-multimodal-vqa

Multimodal Instruction Tuning for Llama 3

Language: Python - Size: 31.3 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 49 - Forks: 11

Skyline-9/Shotluck-Holmes

[ACM MMGR '24] 🔍 Shotluck Holmes: A family of small-scale LLVMs for shot-level video understanding

Language: Python - Size: 26.3 MB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 11 - Forks: 0

ashleykleynhans/llava-docker

Docker image for LLaVA: Large Language and Vision Assistant

Language: Shell - Size: 97.7 KB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

xinyanghuang7/Basic-Visual-Language-Model

Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖

Language: Python - Size: 34.7 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

MuhammadAliS/CLIP

PyTorch implementation of OpenAI's CLIP model for image classification, visual search, and visual question answering (VQA).

Language: Jupyter Notebook - Size: 15.7 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

ecoxial2007/EffVideoQA

Efficient Video Question Answering

Language: Python - Size: 3.74 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos