vision-language-learning | Topic

Topic: "vision-language-learning"

AIDC-AI/Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Language: Python - Size: 5.56 MB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 904 - Forks: 56

RLHF-V/RLAIF-V

[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

Language: Python - Size: 60 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 361 - Forks: 14

shikiw/OPERA

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Language: Python - Size: 15.7 MB - Last synced at: 25 days ago - Pushed at: 9 months ago - Stars: 332 - Forks: 28

shikiw/Modality-Integration-Rate

The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".

Language: Python - Size: 17.7 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 97 - Forks: 3

YunzeMan/Situation3D

[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning

Language: Python - Size: 63.3 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 28 - Forks: 2

LooperXX/ManagerTower

Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

Language: Python - Size: 6.71 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

yubin1219/CrossVLT

Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation (Published in IEEE TMM 2023)

Language: Python - Size: 5.29 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 5 - Forks: 0

fork123aniket/Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot

Streamlit App Combining Vision, Language, and Audio AI Models

Language: Python - Size: 18.6 KB - Last synced at: 25 days ago - Pushed at: 4 months ago - Stars: 3 - Forks: 0

lyuchenyang/Dialogue-to-Video-Retrieval

Code for ECIR 2023 paper "Dialogue-to-Video Retrieval"

Language: Python - Size: 34.2 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 1

fork123aniket/Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling

Language: Python - Size: 94.7 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

Ravi-Teja-konda/TunedLlavaDelights

Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition

Language: Python - Size: 43.3 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

abhinav-neil/socratic-models Fork of milenakapralova/socraticmodels

Socratic models for multimodal reasoning & image captioning

Language: Jupyter Notebook - Size: 48.8 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos