GitHub topics: qwen2-vl
PaddlePaddle/PaddleMIX
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Language: Python - Size: 179 MB - Last synced at: about 13 hours ago - Pushed at: 2 days ago - Stars: 645 - Forks: 214

2U1/Qwen2-VL-Finetune
An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.
Language: Python - Size: 102 KB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 742 - Forks: 92

modelscope/ms-swift
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4v, Phi4, ...) (AAAI 2025).
Language: Python - Size: 61.7 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 7,631 - Forks: 646

roboflow/maestro
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
Language: Python - Size: 10.6 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 2,562 - Forks: 205

arcstep/illufly
✨🦋 illufly - 【幻蝶】基于记忆蒸馏、资料检索的自我进化智能体
Language: Python - Size: 26.2 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 62 - Forks: 9

NetEase-Media/grps_trtllm
Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.
Language: Python - Size: 135 MB - Last synced at: 7 days ago - Pushed at: 11 days ago - Stars: 134 - Forks: 10

Younis-Ahmed/qwen-ai-provider
Community-built Qwen AI Provider for Vercel AI SDK - Integrate Alibaba Cloud's Qwen models with Vercel's AI application framework
Language: TypeScript - Size: 372 KB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 8 - Forks: 2

Dishu-Bansal/Documatic
A AI- Powered Document organizer tool. It displays a small cute robot on the screen. Give it any file and a small description (optional), It will analyse the contents and description and save it on cloud. When needed, just double click on it, enter the description/keywords for the file you are looking for, It will open the best matching file/
Language: C - Size: 242 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

drive-bench/toolkit
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Language: Python - Size: 14.4 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 66 - Forks: 1

polymathbenchmark/polymathbenchmark.github.io
A Challenging Multi-Modal Mathematical Reasoning Benchmark
Language: JavaScript - Size: 2.01 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

aws-samples/sample-for-multi-modal-document-to-json-with-sagemaker-ai
This open-source project delivers a complete pipeline for converting multi-page documents (PDFs/images) into structured JSON using Vision LLMs on Amazon SageMaker. The solution leverages the SWIFT Framework to fine-tune models specifically for document understanding tasks.
Language: Jupyter Notebook - Size: 3.18 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 3 - Forks: 0

soulteary/dify-with-qwen-vl
视频理解:千问视频多模态模型 & Dify
Language: Python - Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 47 - Forks: 9

PRITHIVSAKTHIUR/Aya-Vision-Ocr-vs-Qwen2VL-Ocr
Messy Handwriting OCR Comparison Between Aya-Vision-8B and Qwen2VL-OCR-2B
Language: Python - Size: 18.6 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

fireicewolf/wd-llm-caption-cli
A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha,meta Llama 3.2 Vision Instruct and Qwen2 VL Instruct models.
Language: Python - Size: 1.92 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 34 - Forks: 8

Pavansomisetty21/Qwen2-Vision-Finetuning-Unsloth---Maths-OCR-Formulae-Extraction-
we finetune unsloth llama model to extract mathematical fomulas in the images with optical character recognition(OCR)
Language: Jupyter Notebook - Size: 43 KB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0

SimonGino/repoicon
使用 AI 为你的 GitHub 仓库生成精美的极简图标。
Language: TypeScript - Size: 77.1 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

PRITHIVSAKTHIUR/Multimodal-OCR
OCR Vision Language Model
Language: Python - Size: 4.41 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

BUAADreamer/Qwen2-VL-History
Qwen2-VL在文旅领域的LLaMA-Factory微调案例 The case for fine-tuning Qwen2-VL in the field of historical literature and museums
Size: 73.8 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 6 - Forks: 2

aws-samples/multi-modal-examples-for-amazon-sagemaker
A workshop for collections of multi-modal LLM examples, samples, reference architecture and demos on Amazon SageMaker.
Language: Jupyter Notebook - Size: 33.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 2

HemantM29/Multimodal-Document-Analysis-and-Query-Retrieval
This project performs multimodal document analysis and query retrieval by downloading PDFs, converting pages to images, indexing them for semantic search, and analyzing retrieved images using visual-language models like Qwen2VL and Blip2.
Language: Jupyter Notebook - Size: 1.9 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Valdanitooooo/chat_with_qwen2_vl_test
Language: Python - Size: 16.6 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 0

Yatish54321/Flipkart_Grid_6.0_Robotics_level2_model
"Smart Vision Technology for Quality Control" uses computer vision to automate product inspections, extracting details like product name, quantity, expiry date, and freshness from images. Built for Flipkart Grid 6.0, it enhances accuracy and efficiency in quality control, minimizing manual checks.
Language: Jupyter Notebook - Size: 123 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

tatsuya-fukuoka/Qwen2-VL-demo
Qwen2-VLのデモNotebook
Language: Jupyter Notebook - Size: 2.74 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

ArchismwanChatterjee/OCR-and-Document-Search-Web-Application-Prototype
OCR and Document Search Web Application
Language: Jupyter Notebook - Size: 463 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

Kazuhito00/Qwen2-VL-Colaboratory-Sample
Colaboratory上でQwenLM/Qwen2-VLをお試しするサンプル
Language: Jupyter Notebook - Size: 18.5 MB - Last synced at: 10 days ago - Pushed at: 9 months ago - Stars: 7 - Forks: 0
