GitHub topics: qwen2-vl

Repositories

PaddlePaddle/PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Language: Python - Size: 179 MB - Last synced at: about 13 hours ago - Pushed at: 2 days ago - Stars: 645 - Forks: 214

2U1/Qwen2-VL-Finetune

An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.

Language: Python - Size: 102 KB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 742 - Forks: 92

modelscope/ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4v, Phi4, ...) (AAAI 2025).

Language: Python - Size: 61.7 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 7,631 - Forks: 646

roboflow/maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Language: Python - Size: 10.6 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 2,562 - Forks: 205

arcstep/illufly

✨🦋 illufly - 【幻蝶】基于记忆蒸馏、资料检索的自我进化智能体

Language: Python - Size: 26.2 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 62 - Forks: 9

NetEase-Media/grps_trtllm

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.

Language: Python - Size: 135 MB - Last synced at: 7 days ago - Pushed at: 11 days ago - Stars: 134 - Forks: 10

Younis-Ahmed/qwen-ai-provider

Community-built Qwen AI Provider for Vercel AI SDK - Integrate Alibaba Cloud's Qwen models with Vercel's AI application framework

Language: TypeScript - Size: 372 KB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 8 - Forks: 2

Dishu-Bansal/Documatic

A AI- Powered Document organizer tool. It displays a small cute robot on the screen. Give it any file and a small description (optional), It will analyse the contents and description and save it on cloud. When needed, just double click on it, enter the description/keywords for the file you are looking for, It will open the best matching file/

Language: C - Size: 242 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

drive-bench/toolkit

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

Language: Python - Size: 14.4 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 66 - Forks: 1

polymathbenchmark/polymathbenchmark.github.io

A Challenging Multi-Modal Mathematical Reasoning Benchmark

Language: JavaScript - Size: 2.01 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

aws-samples/sample-for-multi-modal-document-to-json-with-sagemaker-ai

This open-source project delivers a complete pipeline for converting multi-page documents (PDFs/images) into structured JSON using Vision LLMs on Amazon SageMaker. The solution leverages the SWIFT Framework to fine-tune models specifically for document understanding tasks.

Language: Jupyter Notebook - Size: 3.18 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 3 - Forks: 0

soulteary/dify-with-qwen-vl

视频理解：千问视频多模态模型 & Dify

Language: Python - Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 47 - Forks: 9

PRITHIVSAKTHIUR/Aya-Vision-Ocr-vs-Qwen2VL-Ocr

Messy Handwriting OCR Comparison Between Aya-Vision-8B and Qwen2VL-OCR-2B

Language: Python - Size: 18.6 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

fireicewolf/wd-llm-caption-cli

A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha,meta Llama 3.2 Vision Instruct and Qwen2 VL Instruct models.

Language: Python - Size: 1.92 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 34 - Forks: 8

Pavansomisetty21/Qwen2-Vision-Finetuning-Unsloth---Maths-OCR-Formulae-Extraction-

we finetune unsloth llama model to extract mathematical fomulas in the images with optical character recognition(OCR)

Language: Jupyter Notebook - Size: 43 KB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0

SimonGino/repoicon

使用 AI 为你的 GitHub 仓库生成精美的极简图标。

Language: TypeScript - Size: 77.1 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

PRITHIVSAKTHIUR/Multimodal-OCR

OCR Vision Language Model

Language: Python - Size: 4.41 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

BUAADreamer/Qwen2-VL-History

Qwen2-VL在文旅领域的LLaMA-Factory微调案例 The case for fine-tuning Qwen2-VL in the field of historical literature and museums

Size: 73.8 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 6 - Forks: 2

aws-samples/multi-modal-examples-for-amazon-sagemaker

A workshop for collections of multi-modal LLM examples, samples, reference architecture and demos on Amazon SageMaker.

Language: Jupyter Notebook - Size: 33.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 2

HemantM29/Multimodal-Document-Analysis-and-Query-Retrieval

This project performs multimodal document analysis and query retrieval by downloading PDFs, converting pages to images, indexing them for semantic search, and analyzing retrieved images using visual-language models like Qwen2VL and Blip2.

Language: Jupyter Notebook - Size: 1.9 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Valdanitooooo/chat_with_qwen2_vl_test

Language: Python - Size: 16.6 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 0

Yatish54321/Flipkart_Grid_6.0_Robotics_level2_model

"Smart Vision Technology for Quality Control" uses computer vision to automate product inspections, extracting details like product name, quantity, expiry date, and freshness from images. Built for Flipkart Grid 6.0, it enhances accuracy and efficiency in quality control, minimizing manual checks.

Language: Jupyter Notebook - Size: 123 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

Related Keywords

qwen2-vl 25 llm 8 multimodal 6 qwen2 4 vlm 4 ocr 4 llama 3 internvl 3 huggingface-transformers 3 python 2 ai 2 qwen 2 qwen3 2 openai 2 deepseek-r1 2 vision-language-model 2 llama3 2 qwen2-5 2 multimodal-large-language-models 2 sft 2 fine-tuning 2 florence-2 2 sagemaker 2 vision-transformer 2 transformers 2 artificial-intelligence 2 image-to-text 2 minicpm-v 2 internvl2 2 dify 1 ai-engineering 1 aya-vision 1 image-caption 1 joy-caption 1 vllm 1 deep-learning 1 machine-learning 1 neural-network 1 autonomous-driving 1 chatgpt 1 driving-with-language 1 phi-3 1 vision-language-models 1 benchmark 1 claude-3-5-sonnet 1 gemini-vision-pro 1 gpt-4 1 openai-o1 1 vision 1 aws 1 document-processing 1 huggingface 1 idp 1 swift 1 blip2 1 image-indexing 1 multimodal-analysis 1 natural-language-queries 1 pdf-processing 1 retrieval-augmented-generation 1 semantic-search 1 visual-language-models 1 genai 1 qwen2-vl-2b 1 colpali 1 easyocr 1 got 1 gradio 1 huggingface-spaces 1 colaboratory 1 llama3-vision 1 wd14 1 maths 1 ocr-recognition 1 optical-character-recognition 1 unsloth 1 icon-generator 1 beauty 1 history 1 llama-factory 1 mllm 1 museum 1 supervised-finetuning 1 multi-modality 1 sagemaker-example 1 sagemaker-studio 1 video-llava 1 aigc 1 deploy 1 embedding 1 grpo 1 liger 1 llama4 1 lora 1 megatron 1 omni 1 open-r1 1 peft 1 qwen3-moe 1 rft 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos