OpenGVLab | GitHub owners | Ecosyste.ms: Repos

OpenGVLab/LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Language: Python - Size: 33.6 MB - Last synced at: about 5 hours ago - Pushed at: about 1 year ago - Stars: 5,870 - Forks: 382

OpenGVLab/PIIP

[NeurIPS 2024 Spotlight ⭐️] Parameter-Inverted Image Pyramid Networks (PIIP)

Language: Python - Size: 11.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 89 - Forks: 2

OpenGVLab/VideoChat-R1

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Language: Python - Size: 5.99 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 116 - Forks: 2

OpenGVLab/VideoMamba

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

Language: Python - Size: 2.18 MB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 944 - Forks: 72

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Language: Python - Size: 41.9 MB - Last synced at: about 3 hours ago - Pushed at: 9 months ago - Stars: 3,214 - Forks: 230

OpenGVLab/Instruct2Act

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

Language: Python - Size: 30 MB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 360 - Forks: 23

OpenGVLab/VideoChat-Flash

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Language: Python - Size: 6.72 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 402 - Forks: 11

OpenGVLab/OmniCorpus

[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Language: Python - Size: 5.48 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 343 - Forks: 6

OpenGVLab/OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Language: Python - Size: 8.13 MB - Last synced at: 3 days ago - Pushed at: 7 months ago - Stars: 805 - Forks: 62

OpenGVLab/EfficientQAT

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Language: Python - Size: 82 KB - Last synced at: 3 days ago - Pushed at: 7 months ago - Stars: 265 - Forks: 19

OpenGVLab/FluxViT

Make Your Training Flexible: Towards Deployment-Efficient Video Models

Language: Python - Size: 1.35 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 24 - Forks: 0

OpenGVLab/InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Language: Python - Size: 53.2 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1,830 - Forks: 111

OpenGVLab/InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Language: Python - Size: 38.5 MB - Last synced at: 17 days ago - Pushed at: 23 days ago - Stars: 7,817 - Forks: 591

OpenGVLab/Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Language: Python - Size: 21.5 MB - Last synced at: 20 days ago - Pushed at: about 1 year ago - Stars: 517 - Forks: 38

OpenGVLab/MMT-Bench

ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Language: Python - Size: 9.82 MB - Last synced at: 23 days ago - Pushed at: 10 months ago - Stars: 107 - Forks: 3

OpenGVLab/V2PE

[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

Language: Python - Size: 776 KB - Last synced at: 22 days ago - Pushed at: 5 months ago - Stars: 42 - Forks: 2

OpenGVLab/PonderV2

[T-PAMI 2025] PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

Language: Python - Size: 873 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 338 - Forks: 8

OpenGVLab/MM-NIAH

[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.

Language: Python - Size: 2.83 MB - Last synced at: 23 days ago - Pushed at: 6 months ago - Stars: 115 - Forks: 6

OpenGVLab/DragGAN

Unofficial Implementation of DragGAN - "Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold" （DragGAN 全功能实现，在线Demo，本地部署试用，代码、模型已全部开源，支持Windows, macOS, Linux）

Language: Python - Size: 7.84 MB - Last synced at: 29 days ago - Pushed at: almost 2 years ago - Stars: 4,988 - Forks: 488

OpenGVLab/InternImage

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Language: Python - Size: 26.5 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 2,631 - Forks: 244

OpenGVLab/Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Language: Python - Size: 20.7 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 3,213 - Forks: 261

OpenGVLab/TimeSuite

[ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

Language: Python - Size: 12.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 30 - Forks: 1

OpenGVLab/SAM-Med2D

Official implementation of SAM-Med2D

Language: Jupyter Notebook - Size: 28.7 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 955 - Forks: 90

OpenGVLab/Diffree

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Language: Python - Size: 66.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 238 - Forks: 14

OpenGVLab/M3I-Pretraining

[CVPR 2023] implementation of Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.

Size: 600 KB - Last synced at: 21 days ago - Pushed at: almost 2 years ago - Stars: 91 - Forks: 5

OpenGVLab/.github

Size: 1.17 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 1

OpenGVLab/Mono-InternVL

[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Language: Python - Size: 1.43 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 25 - Forks: 0

OpenGVLab/PhyGenBench

The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Language: Python - Size: 17.9 MB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 92 - Forks: 1

OpenGVLab/DCNv4

[CVPR 2024] Deformable Convolution v4

Language: Python - Size: 615 KB - Last synced at: about 2 months ago - Pushed at: 12 months ago - Stars: 603 - Forks: 33

OpenGVLab/VideoMAEv2

[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

Language: Python - Size: 935 KB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 594 - Forks: 68

OpenGVLab/DriveMLM

Size: 2.68 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 173 - Forks: 4

OpenGVLab/EgoVideo

[CVPR 2024 Champions][ICLR 2025] Solutions for EgoVis Chanllenges in CVPR 2024

Language: Jupyter Notebook - Size: 47.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 124 - Forks: 3

OpenGVLab/CaFo

[CVPR 2023] Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

Language: Python - Size: 7.16 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 365 - Forks: 19

OpenGVLab/STM-Evaluation

Language: Python - Size: 5.49 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 70 - Forks: 6

OpenGVLab/UniFormerV2

[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer

Language: Python - Size: 1.78 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 306 - Forks: 21

OpenGVLab/Awesome-LLM4Tool

A curated list of the papers, repositories, tutorials, and anythings related to the large language models for tools

Size: 43 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 67 - Forks: 9

OpenGVLab/PVC

[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Language: Python - Size: 2.74 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 28 - Forks: 0

OpenGVLab/Vision-RWKV

[ICLR 2025 Spotlight] Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Language: Python - Size: 903 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 404 - Forks: 17

OpenGVLab/LCL

[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

Language: Python - Size: 2.59 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 68 - Forks: 3

OpenGVLab/vinci

Language: Python - Size: 2.49 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 28 - Forks: 2

OpenGVLab/TPO

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Language: Python - Size: 538 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

OpenGVLab/MMIU

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Language: Python - Size: 1.3 MB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 58 - Forks: 2

OpenGVLab/VLMEvalKit_InternVL2_5 Fork of open-compass/VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

Language: Python - Size: 2.57 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

OpenGVLab/Hulk

An official implementation of "Hulk: A Universal Knowledge Translator for Human-Centric Tasks"

Language: Python - Size: 224 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 101 - Forks: 4

OpenGVLab/all-seeing

[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"

Language: Python - Size: 57.5 MB - Last synced at: 5 months ago - Pushed at: 9 months ago - Stars: 463 - Forks: 16

OpenGVLab/LAMM

[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents

Language: Python - Size: 16.6 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 303 - Forks: 17

OpenGVLab/GUI-Odyssey

GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos.

Language: Python - Size: 8.16 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 65 - Forks: 2

OpenGVLab/InternVL-MMDetSeg

Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed

Language: Jupyter Notebook - Size: 29.1 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 51 - Forks: 4

OpenGVLab/De-focus-Attention-Networks

Learning 1D Causal Visual Representation with De-focus Attention Networks

Language: Python - Size: 8.82 MB - Last synced at: 6 months ago - Pushed at: 11 months ago - Stars: 29 - Forks: 0

OpenGVLab/gv-benchmark

General Vision Benchmark, GV-B, a project from OpenGVLab

Language: Python - Size: 85.9 KB - Last synced at: 6 months ago - Pushed at: about 3 years ago - Stars: 189 - Forks: 12

OpenGVLab/ChartAst

ChartAssistant is a chart-based vision-language model for universal chart comprehension and reasoning.

Language: Python - Size: 15.5 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 100 - Forks: 8

OpenGVLab/VisionLLM

VisionLLM Series

Language: Python - Size: 17.4 MB - Last synced at: 9 months ago - Pushed at: 10 months ago - Stars: 778 - Forks: 16

OpenGVLab/ControlLLM

ControlLLM: Augment Language Models with Tools by Searching on Graphs

Language: Python - Size: 19.3 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 174 - Forks: 9

OpenGVLab/HumanBench

This repo is official implementation of HumanBench (CVPR2023)

Language: Python - Size: 27.8 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 217 - Forks: 9

OpenGVLab/PhyBench

The official repo of PhyBench

Language: Python - Size: 1.49 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 10 - Forks: 1

OpenGVLab/Siamese-Image-Modeling

[CVPR 2023]Implementation of Siamese Image Modeling for Self-Supervised Vision Representation Learning

Language: Python - Size: 171 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 32 - Forks: 5

OpenGVLab/EgoExoLearn

Data and benchmark code for the EgoExoLearn dataset

Language: Python - Size: 16.8 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 0

OpenGVLab/video-mamba-suite

The suite of modeling video with Mamba

Language: Python - Size: 114 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 144 - Forks: 14

OpenGVLab/DiffAgent

[CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

Size: 8.79 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 9 - Forks: 0

OpenGVLab/MM-Interleaved

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

Language: Python - Size: 3.6 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 154 - Forks: 9

OpenGVLab/GITM

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

Size: 65.7 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 566 - Forks: 16

OpenGVLab/Multitask-Model-Selector

Implementation of Foundation Model is Efficient Multimodal Multitask Model Selector

Language: Python - Size: 371 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 1

OpenGVLab/efficient-video-recognition

Language: Python - Size: 1.33 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 155 - Forks: 14

OpenGVLab/DiffRate

[ICCV 23]An approach to enhance the efficiency of Vision Transformer (ViT) by concurrently employing token pruning and token merging techniques, while incorporating a differentiable compression rate.

Language: Jupyter Notebook - Size: 1.4 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 72 - Forks: 7

OpenGVLab/LORIS

Long-Term Rhythmic Video Soundtracker, ICML2023

Language: Python - Size: 857 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 50 - Forks: 1

OpenGVLab/unmasked_teacher

[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models

Language: Python - Size: 3.04 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 241 - Forks: 11

OpenGVLab/InternLMM

Size: 8.95 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 17 - Forks: 0

OpenGVLab/UniHCP

Official PyTorch implementation of UniHCP

Language: Python - Size: 9 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 135 - Forks: 6

OpenGVLab/MUTR

[AAAI 2024] Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation

Language: Python - Size: 8.3 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 51 - Forks: 3

OpenGVLab/DDPS

Official Implementation of "Denoising Diffusion Semantic Segmentation with Mask Prior Modeling"

Language: Python - Size: 417 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 62 - Forks: 3

OpenGVLab/Awesome-DragGAN

Awesome-DragGAN: A curated list of papers, tutorials, repositories related to DragGAN

Size: 30.3 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 80 - Forks: 2

OpenGVLab/LLMPrune-BESA

BESA is a differentiable weight pruning technique for large language models.

Language: Python - Size: 30.3 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 1

OpenGVLab/perception_test_iccv2023

Champion Solutions repository for Perception Test challenges in ICCV2023 workshop.

Language: Python - Size: 16.9 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 0

OpenGVLab/MovieMind

Size: 1000 Bytes - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 12 - Forks: 0

OpenGVLab/EmbodiedGPT

Size: 104 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 29 - Forks: 2

OpenGVLab/MoVQA

Size: 1000 Bytes - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

OpenGVLab/LTVU-LLM

Size: 0 Bytes - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

OpenGVLab/Official-ConvMAE-Det

Language: Python - Size: 1.04 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 2

OpenGVLab/opengvlab.github.io

Size: 27.3 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 3

GitHub / OpenGVLab 58 Repositories

OpenGVLab/VLMEvalKit_InternVL2_5 Fork of open-compass/VLMEvalKit