GitHub topics: multimodal
jaisidhsingh/CoN-CLIP
Implementation of the "Learn No to Say Yes Better" paper.
Language: Python - Size: 4.36 MB - Last synced at: about 6 hours ago - Pushed at: about 7 hours ago - Stars: 31 - Forks: 2

isLinXu/paper-list
autoupdate paper list
Language: Python - Size: 139 MB - Last synced at: about 7 hours ago - Pushed at: about 8 hours ago - Stars: 79 - Forks: 9

omegalabsinc/omegalabs-bittensor-subnet
The World's Largest Decentralized AGI Multimodal Dataset
Language: Python - Size: 46.4 MB - Last synced at: about 7 hours ago - Pushed at: about 8 hours ago - Stars: 48 - Forks: 25

Swap98-Coder/mlx-audio
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
Size: 1.95 KB - Last synced at: about 9 hours ago - Pushed at: about 9 hours ago - Stars: 0 - Forks: 0

morphik-org/morphik-core
Open source multi-modal RAG for building AI apps over private knowledge.
Language: Python - Size: 116 MB - Last synced at: about 13 hours ago - Pushed at: about 15 hours ago - Stars: 2,097 - Forks: 142

yaotingwangofficial/Awesome-MCoT
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Size: 4.59 MB - Last synced at: about 14 hours ago - Pushed at: about 15 hours ago - Stars: 553 - Forks: 13

iterative/datachain
ETL, Analytics, Versioning for Unstructured Data
Language: Python - Size: 10.5 MB - Last synced at: about 6 hours ago - Pushed at: about 6 hours ago - Stars: 2,548 - Forks: 112

Eurus-Holmes/Awesome-Multimodal-Research
A curated list of Multimodal Related Research.
Language: Python - Size: 903 MB - Last synced at: about 14 hours ago - Pushed at: almost 2 years ago - Stars: 1,348 - Forks: 150

OS-Copilot/OS-Genesis
Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Language: Jupyter Notebook - Size: 4.69 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 131 - Forks: 10

bentoml/BentoML
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Language: Python - Size: 95.5 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 7,684 - Forks: 837

Blaizzy/mlx-audio
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
Language: Python - Size: 4.07 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,063 - Forks: 80

RoffyS/MarkEverythingDown
Convert files (PDF, image, Word, PPT, Excel, notebooks, code snippets) to markdown using powerful multimodal LLM
Language: Python - Size: 4.82 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 232 - Forks: 22

SkalskiP/courses
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
Language: Python - Size: 71.3 KB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 5,988 - Forks: 550

jina-ai/serve
☁️ Build multimodal AI applications with cloud-native stack
Language: Python - Size: 1.57 GB - Last synced at: 1 day ago - Pushed at: about 2 months ago - Stars: 21,552 - Forks: 2,225

Flagro/OmniModKit
Multimodal LLM toolkit
Language: Python - Size: 428 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

jacobphillips99/ares
ARES - Automatic Robot Evaluation System. A simple, scalable solution for robotics research
Language: Python - Size: 8.92 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 33 - Forks: 0

motis-project/motis
multimodal routing, geocoding, and map tiles
Language: C++ - Size: 24.7 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 288 - Forks: 75

modelscope/ms-swift
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, GLM4, Mistral, Yi1.5, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, DeepSeek-VL2, Phi4, GOT-OCR2, ...).
Language: Python - Size: 61.3 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 7,424 - Forks: 630

ranaroussi/muxi-llm
A unified interface for interacting with LLMs from various providers – with support for hundreds of models, built-in fallback mechanisms, and enhanced reliability features.
Language: Python - Size: 549 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

RainBowLuoCS/OpenOmni
OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis
Language: Python - Size: 8.4 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 47 - Forks: 5

hci-lab-um/cactus
Constraint-free multi-modal Access to Communication Technology for Users with Severe motor impairments. This proposal pushes the state of the art through novel eye-tracking interaction patterns along with the introduction of secondary input modalities to improve throughput and usability.
Language: JavaScript - Size: 9.25 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
Language: Python - Size: 7.34 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 2,434 - Forks: 273

mindspore-lab/mindway
the way -> '道' ; focus on multimodal large language model mllm
Language: Python - Size: 714 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 6 - Forks: 16

Mrkomiljon/awesome-generative-ai
Multimodal generative AI resources : talking heads, STT, TTS, image & video generation, and more.
Size: 2.31 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 1

firebase/genkit
An open source framework for building AI-powered apps with familiar code-centric patterns. Genkit makes it easy to develop, integrate, and test AI features with observability and evaluations. Genkit works with various models and platforms.
Language: TypeScript - Size: 136 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1,799 - Forks: 245

AdityaLab/MM4TSA
A professional list on Multi-Modalities For Time Series Analysis (MM4TSA) Papers and Resource.
Size: 457 KB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 35 - Forks: 1

PKU-Alignment/align-anything
Align Anything: Training All-modality Model with Feedback
Language: Jupyter Notebook - Size: 108 MB - Last synced at: 3 days ago - Pushed at: 9 days ago - Stars: 3,601 - Forks: 418

codecat0/Deep-Learning-With-Code
本项目包含论文阅读、视觉、自然语言处理、大模型、多模态相关教程等
Language: Python - Size: 29.3 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Language: Python - Size: 17.4 MB - Last synced at: 2 days ago - Pushed at: 16 days ago - Stars: 5,563 - Forks: 939

shreydan/simpleVLM
building a simple VLM. Implementing LlaMA-SmolLM2 from scratch + SigLip2 Vision Model. KV-Caching is supported and implemented from scratch as well
Language: Jupyter Notebook - Size: 7.33 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3 - Forks: 0

microsoft/rag-time
RAG Time: A 5-week Learning Journey to Mastering RAG
Language: Jupyter Notebook - Size: 71.4 MB - Last synced at: 3 days ago - Pushed at: 19 days ago - Stars: 413 - Forks: 187

InternLM/InternLM-XComposer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Language: Python - Size: 199 MB - Last synced at: 3 days ago - Pushed at: 15 days ago - Stars: 2,821 - Forks: 172

haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Language: Python - Size: 13.4 MB - Last synced at: 3 days ago - Pushed at: 9 months ago - Stars: 22,387 - Forks: 2,467

microsoft/torchscale
Foundation Architecture for (M)LLMs
Language: Python - Size: 361 KB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 3,074 - Forks: 217

luban-agi/Awesome-AIGC-Tutorials
Curated tutorials and resources for Large Language Models, AI Painting, and more.
Size: 168 KB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 4,171 - Forks: 277

tongye98/Awesome-Code-Benchmark
A comprehensive code domain benchmark review of LLM researches.
Size: 1.37 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 16 - Forks: 3

deepseek-ai/Janus
Janus-Series: Unified Multimodal Understanding and Generation Models
Language: Python - Size: 6.98 MB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 17,204 - Forks: 2,232

hustvl/EVF-SAM
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
Language: Python - Size: 5.94 MB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 403 - Forks: 18

kyegomez/swarms
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai
Language: Python - Size: 104 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 4,844 - Forks: 557

aviaryhq/cloudglue-js
Official JavaScript / TypeScript SDK for CloudGlue API
Language: TypeScript - Size: 45.9 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 4 - Forks: 0

docarray/docarray
Represent, send, store and search multimodal data
Language: Python - Size: 242 MB - Last synced at: 2 days ago - Pushed at: 16 days ago - Stars: 3,055 - Forks: 232

Capsize-Games/airunner
Privacy focused, local-first, multi-modal inference engine and agent platform for running LLMs, image generation, speech processing, and tool-based automation
Language: Python - Size: 21.7 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 567 - Forks: 40

rerun-io/gradio-rerun-viewer
Rerun viewer with Gradio
Language: Python - Size: 41.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 23 - Forks: 4

pixeltable/pixelbot
Multimodal Infinite Memory AI Agent
Language: JavaScript - Size: 4.66 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 11 - Forks: 2

mediar-ai/screenpipe
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
Language: TypeScript - Size: 312 MB - Last synced at: 4 days ago - Pushed at: 8 days ago - Stars: 14,350 - Forks: 1,036

open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
Language: Python - Size: 13.5 MB - Last synced at: 3 days ago - Pushed at: 6 months ago - Stars: 3,645 - Forks: 1,084

outspeed-ai/voice-devtools
Developer tools to debug and build realtime voice agents. Supports multiple models.
Language: TypeScript - Size: 2.27 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 30 - Forks: 3

BAAI-Agents/GPA-LM
This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges".
Size: 3.81 MB - Last synced at: 3 days ago - Pushed at: 8 months ago - Stars: 143 - Forks: 7

aimagelab/ReflectiVA
[CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Language: Python - Size: 7.17 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 28 - Forks: 0

LSeu-Open/AIEnhancedWork
A collection of AI-driven tools designed to enhance productivity, streamline task automation, and make everyday work more manageable.
Language: Python - Size: 29 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 34 - Forks: 7

xmed-lab/MultiEYE
[IEEE TMI 2024] MultiEYE: Dataset and Benchmark for OCT-Enhanced Retinal Disease Recognition from Fundus Images
Language: Python - Size: 692 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 18 - Forks: 2

atfortes/Awesome-LLM-Reasoning
Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓
Size: 460 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3,038 - Forks: 173

TEN-framework/ten-framework
The world’s first real-time, distributed, cloud-edge collaborative multimodal AI Agent Framework that simultaneously supports C/C++/Go/Python/JS/TS
Language: C - Size: 94.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 5,788 - Forks: 676

HCPLab-SYSU/Book-of-MLM
《多模态大模型:新一代人工智能技术范式》作者:刘阳,林倞
Language: HTML - Size: 33.7 MB - Last synced at: 4 days ago - Pushed at: 5 months ago - Stars: 205 - Forks: 21

PaddlePaddle/PaddleMIX
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Language: Python - Size: 177 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 628 - Forks: 210

rustic-ai/ui-components
React component library for crafting user-friendly and engaging conversational experiences
Language: JavaScript - Size: 20.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 59 - Forks: 12

1set-t/ai-model
Industrial-grade weather visualization system that transforms AI model predictions into professional meteorological plots, emphasizing operational forecasting capabilities.
Size: 1.95 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

akusayudodograu/Agentic-RAG-Story-Generation-with-Multimodal-GenAI
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
Size: 1000 Bytes - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 8 - Forks: 1

JunyiYe/TextFlow
[NAACL 2025] Beyond End-to-End VLMs: Leveraging Intermediate Text Representations for Superior Flowchart Understanding
Language: Python - Size: 284 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 6 - Forks: 2

Yangyi-Chen/Multimodal-AND-Large-Language-Models
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
Size: 3.86 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 621 - Forks: 41

microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Language: Python - Size: 66.4 MB - Last synced at: 4 days ago - Pushed at: 2 months ago - Stars: 21,188 - Forks: 2,620

akshaysinhaaa/sentiment-analysis
Language: Python - Size: 26.4 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

rerun-io/rerun
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
Language: Rust - Size: 644 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 8,337 - Forks: 445

swyxio/ai-notes
notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
Language: HTML - Size: 2.14 MB - Last synced at: 4 days ago - Pushed at: 20 days ago - Stars: 5,643 - Forks: 470

NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Language: Python - Size: 435 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 13,794 - Forks: 2,806

ritzz-ai/GUI-R1
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
Language: Python - Size: 974 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 70 - Forks: 5

Mintplex-Labs/anything-llm
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.
Language: JavaScript - Size: 44.1 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 43,644 - Forks: 4,273

kyegomez/NaViT
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
Language: Python - Size: 210 KB - Last synced at: about 3 hours ago - Pushed at: about 1 month ago - Stars: 230 - Forks: 11

Wangbiao2/R1-Track
R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning.
Language: Python - Size: 1.71 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 28 - Forks: 1

tattle-made/feluda
A configurable engine for analysing multi-lingual and multi-modal content.
Language: Python - Size: 28.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 60 - Forks: 51

enoche/MultimodalRecSys
A curated list of awesome resources about multimodal recommender systems.
Size: 335 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 361 - Forks: 24

roboflow/maestro
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
Language: Python - Size: 10.6 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 2,555 - Forks: 203

oidlabs-com/Lexoid
Multimodal document parser for high quality data understanding and extraction
Language: Python - Size: 46.7 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 44 - Forks: 6

aj-archipelago/cortex
Simplify and accelerate AI-powered application development with structured interfaces to models and powerful prompt execution environments.
Language: JavaScript - Size: 4.16 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 55 - Forks: 5

kdeps/kdeps
Kdeps is an all-in-one AI framework for building Dockerized full-stack AI applications (FE and BE) that includes open-source LLM models out-of-the-box.
Language: Go - Size: 4.26 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 21 - Forks: 1

ALEEEHU/World-Simulator
Simulating the Real World: Survey & Resources, which contains our survey "Simulating the Real World: A Unified Survey of Multimodal Generative Models" and Awesome-Text2X-Resources. Watch this repository for the latest updates! 🔥
Size: 18.1 MB - Last synced at: 5 days ago - Pushed at: 9 days ago - Stars: 246 - Forks: 14

The-Martyr/CausalMM
[ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
Language: Python - Size: 7.1 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 25 - Forks: 2

rom1504/clip-retrieval
Easily compute clip embeddings and build a clip retrieval system with them
Language: Jupyter Notebook - Size: 3.75 MB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 2,546 - Forks: 223

alishhde/ArtBuddy
ArtBuddy is an AI-powered creative companion that enhances your graphic design workflow. It combines multiple intelligent agents to help you brainstorm ideas, find design inspiration, and refine your creative concepts.
Language: Python - Size: 5.11 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

shure-dev/Awesome-LLM-Papers-Comprehensive-Topics
Awesome LLM Papers and repos on very comprehensive topics.
Size: 450 KB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 217 - Forks: 22

SuyogKamble/simpleVLM
building a simple VLM. Implementing LlaMA-SmolLM2 from scratch + SigLip2 Vision Model. KV-Caching is supported and implemented from scratch as well
Language: Jupyter Notebook - Size: 7.33 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

tyler-romero/tyler-romero.github.io
Technical Blog + Personal Website
Language: Nunjucks - Size: 56.3 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2 - Forks: 0

gokayfem/awesome-vlm-architectures
Famous Vision Language Models and Their Architectures
Language: Markdown - Size: 2.26 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 804 - Forks: 42

reasoning-survey/Awesome-Reasoning-Foundation-Models
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models
Size: 7.37 MB - Last synced at: 7 days ago - Pushed at: 15 days ago - Stars: 571 - Forks: 56

GaochangWu/FMF-Benchmark
This is a cross-modal benchmark for industrial anomaly detection.
Language: Python - Size: 6.82 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 9 - Forks: 1

RobotecAI/rai
RAI is an agentic framework for robotics, utilizing Langchain and ROS 2 tools to perform complex actions, defined scenarios, free interface execution, log summaries, voice interaction and more.
Language: Python - Size: 51.9 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 307 - Forks: 39

mahmoodlab/MCAT
Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021
Language: Jupyter Notebook - Size: 540 MB - Last synced at: 4 days ago - Pushed at: about 3 years ago - Stars: 200 - Forks: 40

pixeltable/pixeltable
Pixeltable — AI Data infrastructure providing a declarative, incremental approach for multimodal workloads.
Language: Python - Size: 207 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 185 - Forks: 29

bumbelbee777/SillyAI
Complex-valued neuro-symbolic transformer using PyTorch.
Language: Python - Size: 102 KB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

glami/glami-1m
The largest multilingual image-text classification dataset. It contains fashion products.
Language: Jupyter Notebook - Size: 5.43 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 72 - Forks: 7

pdaicode/awesome-LLMs-finetuning
Collection of resources for finetuning Large Language Models (LLMs).
Size: 103 KB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 77 - Forks: 8

FennelFetish/qapyq
An image viewer and AI-assisted editing/captioning/masking tool that helps with curating datasets for generative AI models, finetunes and LoRA.
Language: Python - Size: 1.44 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 121 - Forks: 5

kyegomez/RT-X
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"
Language: Python - Size: 940 KB - Last synced at: 3 days ago - Pushed at: 15 days ago - Stars: 205 - Forks: 22

jwu114/CAP
[NAACL Findings 2025] Code and data of "Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting"
Language: Python - Size: 88.9 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 3 - Forks: 0

willxxy/ECG-Bench
A Unified Framework for Benchmarking Generative Electrocardiogram-Language Models (ELMs)
Language: Python - Size: 6.67 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 9 - Forks: 2

Moha111-h/Qwen3
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Language: Shell - Size: 3.07 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

rom1504/cc2dataset
Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...
Language: Python - Size: 50.8 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 318 - Forks: 27

willxxy/awesome-mmps
Corpus of resources for multimodal machine learning with physiological signals (mmps).
Size: 1.1 MB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 75 - Forks: 2

cogmhear/avse_challenge Fork of claritychallenge/clarity
COG-MHEAR Audio-Visual Speech Enhancement Challenge
Language: Python - Size: 774 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 40 - Forks: 11

Yutong-Zhou-cv/Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
Size: 69.2 MB - Last synced at: 9 days ago - Pushed at: 15 days ago - Stars: 2,330 - Forks: 200
