GitHub / OpenGVLab 58 Repositories
General Vision Team of Shanghai AI Laboratory
OpenGVLab/LLaMA-Adapter
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
Language: Python - Size: 33.6 MB - Last synced at: about 5 hours ago - Pushed at: about 1 year ago - Stars: 5,870 - Forks: 382

OpenGVLab/PIIP
[NeurIPS 2024 Spotlight ⭐️] Parameter-Inverted Image Pyramid Networks (PIIP)
Language: Python - Size: 11.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 89 - Forks: 2

OpenGVLab/VideoChat-R1
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
Language: Python - Size: 5.99 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 116 - Forks: 2

OpenGVLab/VideoMamba
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
Language: Python - Size: 2.18 MB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 944 - Forks: 72

OpenGVLab/InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Language: Python - Size: 41.9 MB - Last synced at: about 3 hours ago - Pushed at: 9 months ago - Stars: 3,214 - Forks: 230

OpenGVLab/Instruct2Act
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model
Language: Python - Size: 30 MB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 360 - Forks: 23

OpenGVLab/VideoChat-Flash
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Language: Python - Size: 6.72 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 402 - Forks: 11

OpenGVLab/OmniCorpus
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Language: Python - Size: 5.48 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 343 - Forks: 6

OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Language: Python - Size: 8.13 MB - Last synced at: 3 days ago - Pushed at: 7 months ago - Stars: 805 - Forks: 62

OpenGVLab/EfficientQAT
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Language: Python - Size: 82 KB - Last synced at: 3 days ago - Pushed at: 7 months ago - Stars: 265 - Forks: 19

OpenGVLab/FluxViT
Make Your Training Flexible: Towards Deployment-Efficient Video Models
Language: Python - Size: 1.35 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 24 - Forks: 0

OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Language: Python - Size: 53.2 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1,830 - Forks: 111

OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Language: Python - Size: 38.5 MB - Last synced at: 17 days ago - Pushed at: 23 days ago - Stars: 7,817 - Forks: 591

OpenGVLab/Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
Language: Python - Size: 21.5 MB - Last synced at: 20 days ago - Pushed at: about 1 year ago - Stars: 517 - Forks: 38

OpenGVLab/MMT-Bench
ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Language: Python - Size: 9.82 MB - Last synced at: 23 days ago - Pushed at: 10 months ago - Stars: 107 - Forks: 3

OpenGVLab/V2PE
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
Language: Python - Size: 776 KB - Last synced at: 22 days ago - Pushed at: 5 months ago - Stars: 42 - Forks: 2

OpenGVLab/PonderV2
[T-PAMI 2025] PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
Language: Python - Size: 873 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 338 - Forks: 8

OpenGVLab/MM-NIAH
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.
Language: Python - Size: 2.83 MB - Last synced at: 23 days ago - Pushed at: 6 months ago - Stars: 115 - Forks: 6

OpenGVLab/DragGAN
Unofficial Implementation of DragGAN - "Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold" (DragGAN 全功能实现,在线Demo,本地部署试用,代码、模型已全部开源,支持Windows, macOS, Linux)
Language: Python - Size: 7.84 MB - Last synced at: 29 days ago - Pushed at: almost 2 years ago - Stars: 4,988 - Forks: 488

OpenGVLab/InternImage
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Language: Python - Size: 26.5 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 2,631 - Forks: 244

OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Language: Python - Size: 20.7 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 3,213 - Forks: 261

OpenGVLab/TimeSuite
[ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
Language: Python - Size: 12.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 30 - Forks: 1

OpenGVLab/SAM-Med2D
Official implementation of SAM-Med2D
Language: Jupyter Notebook - Size: 28.7 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 955 - Forks: 90

OpenGVLab/Diffree
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
Language: Python - Size: 66.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 238 - Forks: 14

OpenGVLab/M3I-Pretraining
[CVPR 2023] implementation of Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.
Size: 600 KB - Last synced at: 21 days ago - Pushed at: almost 2 years ago - Stars: 91 - Forks: 5

OpenGVLab/.github
Size: 1.17 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 1

OpenGVLab/Mono-InternVL
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Language: Python - Size: 1.43 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 25 - Forks: 0

OpenGVLab/PhyGenBench
The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
Language: Python - Size: 17.9 MB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 92 - Forks: 1

OpenGVLab/DCNv4
[CVPR 2024] Deformable Convolution v4
Language: Python - Size: 615 KB - Last synced at: about 2 months ago - Pushed at: 12 months ago - Stars: 603 - Forks: 33

OpenGVLab/VideoMAEv2
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Language: Python - Size: 935 KB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 594 - Forks: 68

OpenGVLab/DriveMLM
Size: 2.68 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 173 - Forks: 4

OpenGVLab/EgoVideo
[CVPR 2024 Champions][ICLR 2025] Solutions for EgoVis Chanllenges in CVPR 2024
Language: Jupyter Notebook - Size: 47.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 124 - Forks: 3

OpenGVLab/CaFo
[CVPR 2023] Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners
Language: Python - Size: 7.16 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 365 - Forks: 19

OpenGVLab/STM-Evaluation
Language: Python - Size: 5.49 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 70 - Forks: 6

OpenGVLab/UniFormerV2
[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Language: Python - Size: 1.78 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 306 - Forks: 21

OpenGVLab/Awesome-LLM4Tool
A curated list of the papers, repositories, tutorials, and anythings related to the large language models for tools
Size: 43 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 67 - Forks: 9

OpenGVLab/PVC
[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Language: Python - Size: 2.74 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 28 - Forks: 0

OpenGVLab/Vision-RWKV
[ICLR 2025 Spotlight] Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Language: Python - Size: 903 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 404 - Forks: 17

OpenGVLab/LCL
[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Language: Python - Size: 2.59 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 68 - Forks: 3

OpenGVLab/vinci
Language: Python - Size: 2.49 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 28 - Forks: 2

OpenGVLab/TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
Language: Python - Size: 538 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

OpenGVLab/MMIU
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Language: Python - Size: 1.3 MB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 58 - Forks: 2

OpenGVLab/VLMEvalKit_InternVL2_5 Fork of open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Language: Python - Size: 2.57 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

OpenGVLab/Hulk
An official implementation of "Hulk: A Universal Knowledge Translator for Human-Centric Tasks"
Language: Python - Size: 224 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 101 - Forks: 4

OpenGVLab/all-seeing
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"
Language: Python - Size: 57.5 MB - Last synced at: 5 months ago - Pushed at: 9 months ago - Stars: 463 - Forks: 16

OpenGVLab/LAMM
[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
Language: Python - Size: 16.6 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 303 - Forks: 17

OpenGVLab/GUI-Odyssey
GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos.
Language: Python - Size: 8.16 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 65 - Forks: 2

OpenGVLab/InternVL-MMDetSeg
Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed
Language: Jupyter Notebook - Size: 29.1 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 51 - Forks: 4

OpenGVLab/De-focus-Attention-Networks
Learning 1D Causal Visual Representation with De-focus Attention Networks
Language: Python - Size: 8.82 MB - Last synced at: 6 months ago - Pushed at: 11 months ago - Stars: 29 - Forks: 0

OpenGVLab/gv-benchmark
General Vision Benchmark, GV-B, a project from OpenGVLab
Language: Python - Size: 85.9 KB - Last synced at: 6 months ago - Pushed at: about 3 years ago - Stars: 189 - Forks: 12

OpenGVLab/ChartAst
ChartAssistant is a chart-based vision-language model for universal chart comprehension and reasoning.
Language: Python - Size: 15.5 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 100 - Forks: 8

OpenGVLab/VisionLLM
VisionLLM Series
Language: Python - Size: 17.4 MB - Last synced at: 9 months ago - Pushed at: 10 months ago - Stars: 778 - Forks: 16

OpenGVLab/ControlLLM
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Language: Python - Size: 19.3 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 174 - Forks: 9

OpenGVLab/HumanBench
This repo is official implementation of HumanBench (CVPR2023)
Language: Python - Size: 27.8 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 217 - Forks: 9

OpenGVLab/PhyBench
The official repo of PhyBench
Language: Python - Size: 1.49 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 10 - Forks: 1

OpenGVLab/Siamese-Image-Modeling
[CVPR 2023]Implementation of Siamese Image Modeling for Self-Supervised Vision Representation Learning
Language: Python - Size: 171 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 32 - Forks: 5

OpenGVLab/EgoExoLearn
Data and benchmark code for the EgoExoLearn dataset
Language: Python - Size: 16.8 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 0

OpenGVLab/video-mamba-suite
The suite of modeling video with Mamba
Language: Python - Size: 114 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 144 - Forks: 14

OpenGVLab/DiffAgent
[CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
Size: 8.79 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 9 - Forks: 0

OpenGVLab/MM-Interleaved
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Language: Python - Size: 3.6 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 154 - Forks: 9

OpenGVLab/GITM
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
Size: 65.7 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 566 - Forks: 16

OpenGVLab/Multitask-Model-Selector
Implementation of Foundation Model is Efficient Multimodal Multitask Model Selector
Language: Python - Size: 371 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 1

OpenGVLab/efficient-video-recognition
Language: Python - Size: 1.33 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 155 - Forks: 14

OpenGVLab/DiffRate
[ICCV 23]An approach to enhance the efficiency of Vision Transformer (ViT) by concurrently employing token pruning and token merging techniques, while incorporating a differentiable compression rate.
Language: Jupyter Notebook - Size: 1.4 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 72 - Forks: 7

OpenGVLab/LORIS
Long-Term Rhythmic Video Soundtracker, ICML2023
Language: Python - Size: 857 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 50 - Forks: 1

OpenGVLab/unmasked_teacher
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Language: Python - Size: 3.04 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 241 - Forks: 11

OpenGVLab/InternLMM
Size: 8.95 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 17 - Forks: 0

OpenGVLab/UniHCP
Official PyTorch implementation of UniHCP
Language: Python - Size: 9 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 135 - Forks: 6

OpenGVLab/MUTR
[AAAI 2024] Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation
Language: Python - Size: 8.3 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 51 - Forks: 3

OpenGVLab/DDPS
Official Implementation of "Denoising Diffusion Semantic Segmentation with Mask Prior Modeling"
Language: Python - Size: 417 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 62 - Forks: 3

OpenGVLab/Awesome-DragGAN
Awesome-DragGAN: A curated list of papers, tutorials, repositories related to DragGAN
Size: 30.3 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 80 - Forks: 2

OpenGVLab/LLMPrune-BESA
BESA is a differentiable weight pruning technique for large language models.
Language: Python - Size: 30.3 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 1

OpenGVLab/perception_test_iccv2023
Champion Solutions repository for Perception Test challenges in ICCV2023 workshop.
Language: Python - Size: 16.9 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 0

OpenGVLab/MovieMind
Size: 1000 Bytes - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 12 - Forks: 0

OpenGVLab/EmbodiedGPT
Size: 104 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 29 - Forks: 2

OpenGVLab/MoVQA
Size: 1000 Bytes - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

OpenGVLab/LTVU-LLM
Size: 0 Bytes - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

OpenGVLab/Official-ConvMAE-Det
Language: Python - Size: 1.04 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 2

OpenGVLab/opengvlab.github.io
Size: 27.3 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 3
