GitHub topics: multimodal
xmed-lab/MultiEYE
[IEEE TMI 2024] MultiEYE: Dataset and Benchmark for OCT-Enhanced Retinal Disease Recognition from Fundus Images
Language: Python - Size: 692 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 18 - Forks: 2

FuxiaoLiu/LRV-Instruction
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Language: Python - Size: 23.9 MB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 277 - Forks: 13

atfortes/Awesome-LLM-Reasoning
Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓
Size: 460 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 3,038 - Forks: 173

TEN-framework/ten-framework
The world’s first real-time, distributed, cloud-edge collaborative multimodal AI Agent Framework that simultaneously supports C/C++/Go/Python/JS/TS
Language: C - Size: 94.4 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 5,788 - Forks: 676

HCPLab-SYSU/Book-of-MLM
《多模态大模型:新一代人工智能技术范式》作者:刘阳,林倞
Language: HTML - Size: 33.7 MB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 205 - Forks: 21

PaddlePaddle/PaddleMIX
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Language: Python - Size: 177 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 628 - Forks: 210

rustic-ai/ui-components
React component library for crafting user-friendly and engaging conversational experiences
Language: JavaScript - Size: 20.5 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 59 - Forks: 12

1set-t/ai-model
Industrial-grade weather visualization system that transforms AI model predictions into professional meteorological plots, emphasizing operational forecasting capabilities.
Size: 1.95 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

JunyiYe/TextFlow
[NAACL 2025] Beyond End-to-End VLMs: Leveraging Intermediate Text Representations for Superior Flowchart Understanding
Language: Python - Size: 284 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 6 - Forks: 2

Yangyi-Chen/Multimodal-AND-Large-Language-Models
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
Size: 3.86 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 621 - Forks: 41

microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Language: Python - Size: 66.4 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 21,188 - Forks: 2,620

akshaysinhaaa/sentiment-analysis
Language: Python - Size: 26.4 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

rerun-io/rerun
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
Language: Rust - Size: 644 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 8,337 - Forks: 445

swyxio/ai-notes
notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
Language: HTML - Size: 2.14 MB - Last synced at: 7 days ago - Pushed at: 23 days ago - Stars: 5,643 - Forks: 470

ritzz-ai/GUI-R1
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
Language: Python - Size: 974 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 70 - Forks: 5

kyegomez/NaViT
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
Language: Python - Size: 210 KB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 230 - Forks: 11

Wangbiao2/R1-Track
R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning.
Language: Python - Size: 1.71 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 28 - Forks: 1

lxe/llavavision
A simple "Be My Eyes" web app with a llama.cpp/llava backend
Language: JavaScript - Size: 27.2 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 489 - Forks: 32

tattle-made/feluda
A configurable engine for analysing multi-lingual and multi-modal content.
Language: Python - Size: 28.1 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 60 - Forks: 51

enoche/MultimodalRecSys
A curated list of awesome resources about multimodal recommender systems.
Size: 335 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 361 - Forks: 24

roboflow/maestro
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
Language: Python - Size: 10.6 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2,555 - Forks: 203

oidlabs-com/Lexoid
Multimodal document parser for high quality data understanding and extraction
Language: Python - Size: 46.7 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 44 - Forks: 6

kdeps/kdeps
Kdeps is an all-in-one AI framework for building Dockerized full-stack AI applications (FE and BE) that includes open-source LLM models out-of-the-box.
Language: Go - Size: 4.26 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 21 - Forks: 1

ALEEEHU/World-Simulator
Simulating the Real World: Survey & Resources, which contains our survey "Simulating the Real World: A Unified Survey of Multimodal Generative Models" and Awesome-Text2X-Resources. Watch this repository for the latest updates! 🔥
Size: 18.1 MB - Last synced at: 8 days ago - Pushed at: 12 days ago - Stars: 246 - Forks: 14

The-Martyr/CausalMM
[ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
Language: Python - Size: 7.1 MB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 25 - Forks: 2

rom1504/clip-retrieval
Easily compute clip embeddings and build a clip retrieval system with them
Language: Jupyter Notebook - Size: 3.75 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 2,546 - Forks: 223

alishhde/ArtBuddy
ArtBuddy is an AI-powered creative companion that enhances your graphic design workflow. It combines multiple intelligent agents to help you brainstorm ideas, find design inspiration, and refine your creative concepts.
Language: Python - Size: 5.11 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

mbodiai/embodied-agents
Seamlessly integrate state-of-the-art transformer models into robotics stacks
Language: Python - Size: 75.2 MB - Last synced at: 2 days ago - Pushed at: 19 days ago - Stars: 207 - Forks: 22

shure-dev/Awesome-LLM-Papers-Comprehensive-Topics
Awesome LLM Papers and repos on very comprehensive topics.
Size: 450 KB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 217 - Forks: 22

tyler-romero/tyler-romero.github.io
Technical Blog + Personal Website
Language: Nunjucks - Size: 56.3 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 2 - Forks: 0

gokayfem/awesome-vlm-architectures
Famous Vision Language Models and Their Architectures
Language: Markdown - Size: 2.26 MB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 804 - Forks: 42

reasoning-survey/Awesome-Reasoning-Foundation-Models
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models
Size: 7.37 MB - Last synced at: 9 days ago - Pushed at: 18 days ago - Stars: 571 - Forks: 56

GaochangWu/FMF-Benchmark
This is a cross-modal benchmark for industrial anomaly detection.
Language: Python - Size: 6.82 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 9 - Forks: 1

mbzuai-oryx/VideoGPT-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Language: Python - Size: 16.5 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 271 - Forks: 17

mahmoodlab/MCAT
Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021
Language: Jupyter Notebook - Size: 540 MB - Last synced at: 6 days ago - Pushed at: about 3 years ago - Stars: 200 - Forks: 40

bumbelbee777/SillyAI
Complex-valued neuro-symbolic transformer using PyTorch.
Language: Python - Size: 102 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

glami/glami-1m
The largest multilingual image-text classification dataset. It contains fashion products.
Language: Jupyter Notebook - Size: 5.43 MB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 72 - Forks: 7

pdaicode/awesome-LLMs-finetuning
Collection of resources for finetuning Large Language Models (LLMs).
Size: 103 KB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 77 - Forks: 8

kyegomez/RT-X
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"
Language: Python - Size: 940 KB - Last synced at: 6 days ago - Pushed at: 18 days ago - Stars: 205 - Forks: 22

jwu114/CAP
[NAACL Findings 2025] Code and data of "Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting"
Language: Python - Size: 88.9 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 3 - Forks: 0

willxxy/ECG-Bench
A Unified Framework for Benchmarking Generative Electrocardiogram-Language Models (ELMs)
Language: Python - Size: 6.67 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 9 - Forks: 2

Moha111-h/Qwen3
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Language: Shell - Size: 3.07 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

rom1504/cc2dataset
Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...
Language: Python - Size: 50.8 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 318 - Forks: 27

C-W-D-Harshit/lume-ai
AI-powered multimodal chat app with real-time responses, file support, token tracking, and dark mode. Built with Next.js. Open source under MIT.
Language: TypeScript - Size: 1.3 MB - Last synced at: about 6 hours ago - Pushed at: 4 months ago - Stars: 9 - Forks: 2

cogmhear/avse_challenge Fork of claritychallenge/clarity
COG-MHEAR Audio-Visual Speech Enhancement Challenge
Language: Python - Size: 774 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 40 - Forks: 11

Yutong-Zhou-cv/Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
Size: 69.2 MB - Last synced at: 11 days ago - Pushed at: 18 days ago - Stars: 2,330 - Forks: 200

wgcyeo/UniversalRAG
UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities
Size: 623 KB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 34 - Forks: 2

GerrySant/multimodalhugs
MultimodalHugs is an extension of Hugging Face that offers a generalized framework for training, evaluating, and using multimodal AI models with minimal code differences, ensuring seamless compatibility with Hugging Face pipelines.
Language: Python - Size: 4.24 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 3 - Forks: 2

abhiverse01/hatespeech-multimodal-detection
Multi-Modal Hate Speech Detection using Deep Learning.
Language: Jupyter Notebook - Size: 8.32 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0

vlm-run/vlmrun-hub
A hub for various industry-specific schemas to be used with VLMs.
Language: Python - Size: 352 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 501 - Forks: 23

Aisuko/notebooks
Implementation for the different ML tasks on Kaggle platform with GPUs.
Language: Jupyter Notebook - Size: 160 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 20 - Forks: 3

SiddhantBikram/MemeCLIP
MemeCLIP framework and PrideMM Dataset @ EMNLP 2024
Language: Python - Size: 249 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 11 - Forks: 0

Sinapsis-AI/sinapsis
Modular and Universal AI
Language: Python - Size: 374 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 35 - Forks: 10

Stability-AI/stability-sdk
SDK for interacting with stability.ai APIs (e.g. stable diffusion inference)
Language: Jupyter Notebook - Size: 447 MB - Last synced at: about 23 hours ago - Pushed at: 25 days ago - Stars: 2,440 - Forks: 344

AI4HealthUOL/MDS-ED
Repository for the paper 'MDS-ED: Multimodal Decision Support in the Emergency Department – a benchmark dataset based on MIMIC-IV'.
Language: Python - Size: 4 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 18 - Forks: 2

sofiamironbarroso/Multimodal-Cancer
An exploratory repository into different modelling approaches for Multimodal cancer type prediction.
Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

huggingface/OBELICS
Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.
Language: Python - Size: 512 KB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 202 - Forks: 10

bin123apple/InfantAgent
A multimodal agent that can interact with its own PC in a multimodal manner.
Language: Python - Size: 5.24 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 6 - Forks: 0

eliranwong/letmedoit
An advanced AI assistant that leverages the capabilities of ChatGPT API, Gemini Pro, AutoGen, and open-source LLMs, enabling it both to engage in conversations and to execute computing tasks on local devices.
Language: Python - Size: 126 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 127 - Forks: 25

monatis/clip.cpp
CLIP inference in plain C/C++ with no extra dependencies
Language: C++ - Size: 420 KB - Last synced at: 10 days ago - Pushed at: 9 months ago - Stars: 496 - Forks: 46

NetManAIOps/ChatTS
[VLDB' 25] ChatTS: Understanding, Chat, Reasoning about Time Series with TS-MLLM
Language: Python - Size: 3.52 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 137 - Forks: 16

rom1504/img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Language: Python - Size: 3.11 MB - Last synced at: 13 days ago - Pushed at: 9 months ago - Stars: 4,016 - Forks: 353

X-PLUG/MobileAgent
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Language: Python - Size: 383 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 4,149 - Forks: 412

jermmy19998/MMM
Repository forMulti-modal Mutual Mixer
Language: Python - Size: 39.5 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

IrohXu/Awesome-Multimodal-LLM-Autonomous-Driving
[WACV 2024 Survey Paper] Multimodal Large Language Models for Autonomous Driving
Size: 15 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 286 - Forks: 11

HySonLab/Design2Code
Large Language Model in combination with Large Vision Model for the task of code generation given design sketch.
Language: Python - Size: 270 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 3 - Forks: 0

TIGER-AI-Lab/VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"
Language: Python - Size: 4.92 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 75 - Forks: 1

umi-AIGC-saas/umi_ai_cms
双重驱动的智能AI系统,它对接了目前市场上主流的AI大模型,并根据这些大模型的优劣势进行算法分类。通过综合利用各种AI大模型的优势,无忧AI智脑能够提供更准 确、更可靠的信息和解答。
Language: Python - Size: 4.16 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 2 - Forks: 0

showlab/Show-o
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Language: Python - Size: 169 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 1,362 - Forks: 58

xieyuquanxx/awesome-Large-MultiModal-Hallucination 📦
😎 curated list of awesome LMM hallucinations papers, methods & resources.
Size: 66.4 KB - Last synced at: about 5 hours ago - Pushed at: about 1 year ago - Stars: 149 - Forks: 14

open-mmlab/Multimodal-GPT
Multimodal-GPT
Language: Python - Size: 109 KB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 1,498 - Forks: 131

patrick-tssn/Awesome-Colorful-LLM
Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics, Fundamental Sciences such as Mathematics, and Ominous.
Size: 935 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 121 - Forks: 8

KarthikaRajagopal44/Text-to-voice-chatbot
Text-to-Speech (TTS) web application built with Gradio and powered by Microsoft Edge TTS voices
Language: Python - Size: 7.81 KB - Last synced at: 8 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

HICAI-ZJU/Scientific-LLM-Survey
Scientific Large Language Models: A Survey on Biological & Chemical Domains
Size: 523 KB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 304 - Forks: 30

YeonwooSung/MLOps
Miscellaneous codes and writings for MLOps
Language: Jupyter Notebook - Size: 542 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 12 - Forks: 1

visionxiang/awesome-salient-object-detection
A curated list of awesome resources for salient object detection (SOD), focusing more on multi-modal SOD, such as RGB-D SOD.
Size: 82 KB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 118 - Forks: 6

ekonwang/VisuoThink
[Arxiv Paper 2504.09130]: VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search
Language: Python - Size: 15.7 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 8 - Forks: 1

video-db/videodb-chat
Frontend interface for building chat based system and connecting with agent driven workflows.
Language: Vue - Size: 1.02 MB - Last synced at: 10 days ago - Pushed at: 2 months ago - Stars: 13 - Forks: 7

krishnaura45/astro-pulse
Extracting Faint Exoplanetary Signals from Ariel Observations
Size: 4.88 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

nv78/Autonomous-Intelligence
Autonomous Intelligence is a framework for building collaborative, intelligent multi agent AI systems. The framework provides a robust infrastructure for creating and managing multiple AI agents, and enables developers and organizations to build, deploy, and optimize AI agents that work well in dynamic, complex environments.
Language: HTML - Size: 123 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 18 - Forks: 6

overcrash66/OpenTranslator
Open Translator: Speech To Speech and Speech to text Translator with voice cloning and other cool features
Language: Python - Size: 7.48 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 8 - Forks: 2

Open-Social-World/EgoNormia
EgoNormia | Benchmarking Physical Social Norm Understanding in VLMs
Language: Jupyter Notebook - Size: 11.3 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 8 - Forks: 0

2U1/Qwen2-VL-Finetune
An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.
Language: Python - Size: 157 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 668 - Forks: 79

pykale/pykale
Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!
Language: Python - Size: 46.3 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 457 - Forks: 66

tychenjiajun/art
AI-PP3 is a command-line tool that uses artificial intelligence to analyze RAW photos and generate optimized processing profiles (PP3 files) for RawTherapee.
Language: TypeScript - Size: 265 MB - Last synced at: 12 days ago - Pushed at: about 1 month ago - Stars: 23 - Forks: 4

alanqrwang/keymorph
Robust multimodal image registration via keypoints
Language: Python - Size: 690 MB - Last synced at: 14 days ago - Pushed at: 9 months ago - Stars: 78 - Forks: 17

OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Language: Python - Size: 53.2 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1,830 - Forks: 111

pu7yan9/AFENet_MCD
Adversarial Feature Equilibrium Network for Multimodal Change Detection in Heterogeneous Remote Sensing Images
Language: Python - Size: 347 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 12 - Forks: 0

zjunlp/EasyInstruct
[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.
Language: Python - Size: 18.6 MB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 401 - Forks: 36

baryhuang/voice-mcp-client
A iOS/MacOS Swift MCP Client using voice interacting with python MCP servers both natively
Language: Swift - Size: 285 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

nanowell/AdEMAMix-Optimizer-Pytorch
The AdEMAMix Optimizer: Better, Faster, Older.
Language: Python - Size: 13.7 KB - Last synced at: 12 days ago - Pushed at: 8 months ago - Stars: 183 - Forks: 10

vaila-multimodaltoolbox/vaila
https://vaila.readthedocs.io/
Language: Python - Size: 509 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 8 - Forks: 2

oele-isis-vanderbilt/SyncFlow
Harmonize Your Data Streams
Language: TypeScript - Size: 5.18 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 3 - Forks: 0

video-db/videodb-cookbook
Build use cases with VideoDB
Language: Jupyter Notebook - Size: 15.3 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 21 - Forks: 3

westlake-repl/NineRec
Multimodal Dataset and Benchmark for Multi-domain and Cross-domain Recommendation System
Language: Python - Size: 13.4 MB - Last synced at: 5 days ago - Pushed at: 7 months ago - Stars: 92 - Forks: 7

autodistill/autodistill
Images to inference with no labeling (use foundation models to train supervised models).
Language: Python - Size: 1.14 MB - Last synced at: 17 days ago - Pushed at: about 2 months ago - Stars: 2,230 - Forks: 183

mahmoodlab/MMP
Multimodal prototyping for cancer survival prediction - ICML 2024
Language: Jupyter Notebook - Size: 117 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 82 - Forks: 9

luckercs/multimodal-search
Multimodal search, supports searching for images through text and images
Language: Vue - Size: 412 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

MICA-MNI/micaopen
Open Scripts and pipelines from the Multimodal Imaging and Connectome Analysis Lab at the Montreal Neurological Institute
Language: Jupyter Notebook - Size: 1.7 GB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 75 - Forks: 40

AMD-AIG-AIMA/gpt-fast
The GPT-Fast for Multimodal Models on AMD GPUs
Language: Python - Size: 6.03 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 2 - Forks: 0
