An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: multimodal

xmed-lab/MultiEYE

[IEEE TMI 2024] MultiEYE: Dataset and Benchmark for OCT-Enhanced Retinal Disease Recognition from Fundus Images

Language: Python - Size: 692 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 18 - Forks: 2

FuxiaoLiu/LRV-Instruction

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Language: Python - Size: 23.9 MB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 277 - Forks: 13

atfortes/Awesome-LLM-Reasoning

Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓

Size: 460 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 3,038 - Forks: 173

TEN-framework/ten-framework

The world’s first real-time, distributed, cloud-edge collaborative multimodal AI Agent Framework that simultaneously supports C/C++/Go/Python/JS/TS

Language: C - Size: 94.4 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 5,788 - Forks: 676

HCPLab-SYSU/Book-of-MLM

《多模态大模型:新一代人工智能技术范式》作者:刘阳,林倞

Language: HTML - Size: 33.7 MB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 205 - Forks: 21

PaddlePaddle/PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Language: Python - Size: 177 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 628 - Forks: 210

rustic-ai/ui-components

React component library for crafting user-friendly and engaging conversational experiences

Language: JavaScript - Size: 20.5 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 59 - Forks: 12

1set-t/ai-model

Industrial-grade weather visualization system that transforms AI model predictions into professional meteorological plots, emphasizing operational forecasting capabilities.

Size: 1.95 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

JunyiYe/TextFlow

[NAACL 2025] Beyond End-to-End VLMs: Leveraging Intermediate Text Representations for Superior Flowchart Understanding

Language: Python - Size: 284 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 6 - Forks: 2

Yangyi-Chen/Multimodal-AND-Large-Language-Models

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

Size: 3.86 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 621 - Forks: 41

microsoft/unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language: Python - Size: 66.4 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 21,188 - Forks: 2,620

akshaysinhaaa/sentiment-analysis

Language: Python - Size: 26.4 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

rerun-io/rerun

Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.

Language: Rust - Size: 644 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 8,337 - Forks: 445

swyxio/ai-notes

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

Language: HTML - Size: 2.14 MB - Last synced at: 7 days ago - Pushed at: 23 days ago - Stars: 5,643 - Forks: 470

ritzz-ai/GUI-R1

Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

Language: Python - Size: 974 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 70 - Forks: 5

kyegomez/NaViT

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

Language: Python - Size: 210 KB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 230 - Forks: 11

Wangbiao2/R1-Track

R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning.

Language: Python - Size: 1.71 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 28 - Forks: 1

lxe/llavavision

A simple "Be My Eyes" web app with a llama.cpp/llava backend

Language: JavaScript - Size: 27.2 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 489 - Forks: 32

tattle-made/feluda

A configurable engine for analysing multi-lingual and multi-modal content.

Language: Python - Size: 28.1 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 60 - Forks: 51

enoche/MultimodalRecSys

A curated list of awesome resources about multimodal recommender systems.

Size: 335 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 361 - Forks: 24

roboflow/maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Language: Python - Size: 10.6 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2,555 - Forks: 203

oidlabs-com/Lexoid

Multimodal document parser for high quality data understanding and extraction

Language: Python - Size: 46.7 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 44 - Forks: 6

kdeps/kdeps

Kdeps is an all-in-one AI framework for building Dockerized full-stack AI applications (FE and BE) that includes open-source LLM models out-of-the-box.

Language: Go - Size: 4.26 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 21 - Forks: 1

ALEEEHU/World-Simulator

Simulating the Real World: Survey & Resources, which contains our survey "Simulating the Real World: A Unified Survey of Multimodal Generative Models" and Awesome-Text2X-Resources. Watch this repository for the latest updates! 🔥

Size: 18.1 MB - Last synced at: 8 days ago - Pushed at: 12 days ago - Stars: 246 - Forks: 14

The-Martyr/CausalMM

[ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality

Language: Python - Size: 7.1 MB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 25 - Forks: 2

rom1504/clip-retrieval

Easily compute clip embeddings and build a clip retrieval system with them

Language: Jupyter Notebook - Size: 3.75 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 2,546 - Forks: 223

alishhde/ArtBuddy

ArtBuddy is an AI-powered creative companion that enhances your graphic design workflow. It combines multiple intelligent agents to help you brainstorm ideas, find design inspiration, and refine your creative concepts.

Language: Python - Size: 5.11 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

mbodiai/embodied-agents

Seamlessly integrate state-of-the-art transformer models into robotics stacks

Language: Python - Size: 75.2 MB - Last synced at: 2 days ago - Pushed at: 19 days ago - Stars: 207 - Forks: 22

shure-dev/Awesome-LLM-Papers-Comprehensive-Topics

Awesome LLM Papers and repos on very comprehensive topics.

Size: 450 KB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 217 - Forks: 22

tyler-romero/tyler-romero.github.io

Technical Blog + Personal Website

Language: Nunjucks - Size: 56.3 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 2 - Forks: 0

gokayfem/awesome-vlm-architectures

Famous Vision Language Models and Their Architectures

Language: Markdown - Size: 2.26 MB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 804 - Forks: 42

reasoning-survey/Awesome-Reasoning-Foundation-Models

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models

Size: 7.37 MB - Last synced at: 9 days ago - Pushed at: 18 days ago - Stars: 571 - Forks: 56

GaochangWu/FMF-Benchmark

This is a cross-modal benchmark for industrial anomaly detection.

Language: Python - Size: 6.82 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 9 - Forks: 1

mbzuai-oryx/VideoGPT-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Language: Python - Size: 16.5 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 271 - Forks: 17

mahmoodlab/MCAT

Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021

Language: Jupyter Notebook - Size: 540 MB - Last synced at: 6 days ago - Pushed at: about 3 years ago - Stars: 200 - Forks: 40

bumbelbee777/SillyAI

Complex-valued neuro-symbolic transformer using PyTorch.

Language: Python - Size: 102 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

glami/glami-1m

The largest multilingual image-text classification dataset. It contains fashion products.

Language: Jupyter Notebook - Size: 5.43 MB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 72 - Forks: 7

pdaicode/awesome-LLMs-finetuning

Collection of resources for finetuning Large Language Models (LLMs).

Size: 103 KB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 77 - Forks: 8

kyegomez/RT-X

Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"

Language: Python - Size: 940 KB - Last synced at: 6 days ago - Pushed at: 18 days ago - Stars: 205 - Forks: 22

jwu114/CAP

[NAACL Findings 2025] Code and data of "Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting"

Language: Python - Size: 88.9 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 3 - Forks: 0

willxxy/ECG-Bench

A Unified Framework for Benchmarking Generative Electrocardiogram-Language Models (ELMs)

Language: Python - Size: 6.67 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 9 - Forks: 2

Moha111-h/Qwen3

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Language: Shell - Size: 3.07 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

rom1504/cc2dataset

Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...

Language: Python - Size: 50.8 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 318 - Forks: 27

C-W-D-Harshit/lume-ai

AI-powered multimodal chat app with real-time responses, file support, token tracking, and dark mode. Built with Next.js. Open source under MIT.

Language: TypeScript - Size: 1.3 MB - Last synced at: about 6 hours ago - Pushed at: 4 months ago - Stars: 9 - Forks: 2

cogmhear/avse_challenge Fork of claritychallenge/clarity

COG-MHEAR Audio-Visual Speech Enhancement Challenge

Language: Python - Size: 774 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 40 - Forks: 11

Yutong-Zhou-cv/Awesome-Text-to-Image

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

Size: 69.2 MB - Last synced at: 11 days ago - Pushed at: 18 days ago - Stars: 2,330 - Forks: 200

wgcyeo/UniversalRAG

UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities

Size: 623 KB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 34 - Forks: 2

GerrySant/multimodalhugs

MultimodalHugs is an extension of Hugging Face that offers a generalized framework for training, evaluating, and using multimodal AI models with minimal code differences, ensuring seamless compatibility with Hugging Face pipelines.

Language: Python - Size: 4.24 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 3 - Forks: 2

abhiverse01/hatespeech-multimodal-detection

Multi-Modal Hate Speech Detection using Deep Learning.

Language: Jupyter Notebook - Size: 8.32 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0

vlm-run/vlmrun-hub

A hub for various industry-specific schemas to be used with VLMs.

Language: Python - Size: 352 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 501 - Forks: 23

Aisuko/notebooks

Implementation for the different ML tasks on Kaggle platform with GPUs.

Language: Jupyter Notebook - Size: 160 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 20 - Forks: 3

SiddhantBikram/MemeCLIP

MemeCLIP framework and PrideMM Dataset @ EMNLP 2024

Language: Python - Size: 249 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 11 - Forks: 0

Sinapsis-AI/sinapsis

Modular and Universal AI

Language: Python - Size: 374 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 35 - Forks: 10

Stability-AI/stability-sdk

SDK for interacting with stability.ai APIs (e.g. stable diffusion inference)

Language: Jupyter Notebook - Size: 447 MB - Last synced at: about 23 hours ago - Pushed at: 25 days ago - Stars: 2,440 - Forks: 344

AI4HealthUOL/MDS-ED

Repository for the paper 'MDS-ED: Multimodal Decision Support in the Emergency Department – a benchmark dataset based on MIMIC-IV'.

Language: Python - Size: 4 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 18 - Forks: 2

sofiamironbarroso/Multimodal-Cancer

An exploratory repository into different modelling approaches for Multimodal cancer type prediction.

Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

huggingface/OBELICS

Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.

Language: Python - Size: 512 KB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 202 - Forks: 10

bin123apple/InfantAgent

A multimodal agent that can interact with its own PC in a multimodal manner.

Language: Python - Size: 5.24 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 6 - Forks: 0

eliranwong/letmedoit

An advanced AI assistant that leverages the capabilities of ChatGPT API, Gemini Pro, AutoGen, and open-source LLMs, enabling it both to engage in conversations and to execute computing tasks on local devices.

Language: Python - Size: 126 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 127 - Forks: 25

monatis/clip.cpp

CLIP inference in plain C/C++ with no extra dependencies

Language: C++ - Size: 420 KB - Last synced at: 10 days ago - Pushed at: 9 months ago - Stars: 496 - Forks: 46

NetManAIOps/ChatTS

[VLDB' 25] ChatTS: Understanding, Chat, Reasoning about Time Series with TS-MLLM

Language: Python - Size: 3.52 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 137 - Forks: 16

rom1504/img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Language: Python - Size: 3.11 MB - Last synced at: 13 days ago - Pushed at: 9 months ago - Stars: 4,016 - Forks: 353

X-PLUG/MobileAgent

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

Language: Python - Size: 383 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 4,149 - Forks: 412

jermmy19998/MMM

Repository forMulti-modal Mutual Mixer

Language: Python - Size: 39.5 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

IrohXu/Awesome-Multimodal-LLM-Autonomous-Driving

[WACV 2024 Survey Paper] Multimodal Large Language Models for Autonomous Driving

Size: 15 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 286 - Forks: 11

HySonLab/Design2Code

Large Language Model in combination with Large Vision Model for the task of code generation given design sketch.

Language: Python - Size: 270 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 3 - Forks: 0

TIGER-AI-Lab/VL-Rethinker

The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"

Language: Python - Size: 4.92 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 75 - Forks: 1

umi-AIGC-saas/umi_ai_cms

双重驱动的智能AI系统,它对接了目前市场上主流的AI大模型,并根据这些大模型的优劣势进行算法分类。通过综合利用各种AI大模型的优势,无忧AI智脑能够提供更准 确、更可靠的信息和解答。

Language: Python - Size: 4.16 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 2 - Forks: 0

showlab/Show-o

[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Language: Python - Size: 169 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 1,362 - Forks: 58

xieyuquanxx/awesome-Large-MultiModal-Hallucination 📦

😎 curated list of awesome LMM hallucinations papers, methods & resources.

Size: 66.4 KB - Last synced at: about 5 hours ago - Pushed at: about 1 year ago - Stars: 149 - Forks: 14

open-mmlab/Multimodal-GPT

Multimodal-GPT

Language: Python - Size: 109 KB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 1,498 - Forks: 131

patrick-tssn/Awesome-Colorful-LLM

Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics, Fundamental Sciences such as Mathematics, and Ominous.

Size: 935 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 121 - Forks: 8

KarthikaRajagopal44/Text-to-voice-chatbot

Text-to-Speech (TTS) web application built with Gradio and powered by Microsoft Edge TTS voices

Language: Python - Size: 7.81 KB - Last synced at: 8 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

HICAI-ZJU/Scientific-LLM-Survey

Scientific Large Language Models: A Survey on Biological & Chemical Domains

Size: 523 KB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 304 - Forks: 30

YeonwooSung/MLOps

Miscellaneous codes and writings for MLOps

Language: Jupyter Notebook - Size: 542 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 12 - Forks: 1

visionxiang/awesome-salient-object-detection

A curated list of awesome resources for salient object detection (SOD), focusing more on multi-modal SOD, such as RGB-D SOD.

Size: 82 KB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 118 - Forks: 6

ekonwang/VisuoThink

[Arxiv Paper 2504.09130]: VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search

Language: Python - Size: 15.7 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 8 - Forks: 1

video-db/videodb-chat

Frontend interface for building chat based system and connecting with agent driven workflows.

Language: Vue - Size: 1.02 MB - Last synced at: 10 days ago - Pushed at: 2 months ago - Stars: 13 - Forks: 7

krishnaura45/astro-pulse

Extracting Faint Exoplanetary Signals from Ariel Observations

Size: 4.88 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

nv78/Autonomous-Intelligence

Autonomous Intelligence is a framework for building collaborative, intelligent multi agent AI systems. The framework provides a robust infrastructure for creating and managing multiple AI agents, and enables developers and organizations to build, deploy, and optimize AI agents that work well in dynamic, complex environments.

Language: HTML - Size: 123 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 18 - Forks: 6

overcrash66/OpenTranslator

Open Translator: Speech To Speech and Speech to text Translator with voice cloning and other cool features

Language: Python - Size: 7.48 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 8 - Forks: 2

Open-Social-World/EgoNormia

EgoNormia | Benchmarking Physical Social Norm Understanding in VLMs

Language: Jupyter Notebook - Size: 11.3 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 8 - Forks: 0

2U1/Qwen2-VL-Finetune

An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.

Language: Python - Size: 157 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 668 - Forks: 79

pykale/pykale

Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!

Language: Python - Size: 46.3 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 457 - Forks: 66

tychenjiajun/art

AI-PP3 is a command-line tool that uses artificial intelligence to analyze RAW photos and generate optimized processing profiles (PP3 files) for RawTherapee.

Language: TypeScript - Size: 265 MB - Last synced at: 12 days ago - Pushed at: about 1 month ago - Stars: 23 - Forks: 4

alanqrwang/keymorph

Robust multimodal image registration via keypoints

Language: Python - Size: 690 MB - Last synced at: 14 days ago - Pushed at: 9 months ago - Stars: 78 - Forks: 17

OpenGVLab/InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Language: Python - Size: 53.2 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1,830 - Forks: 111

pu7yan9/AFENet_MCD

Adversarial Feature Equilibrium Network for Multimodal Change Detection in Heterogeneous Remote Sensing Images

Language: Python - Size: 347 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 12 - Forks: 0

zjunlp/EasyInstruct

[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.

Language: Python - Size: 18.6 MB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 401 - Forks: 36

baryhuang/voice-mcp-client

A iOS/MacOS Swift MCP Client using voice interacting with python MCP servers both natively

Language: Swift - Size: 285 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

nanowell/AdEMAMix-Optimizer-Pytorch

The AdEMAMix Optimizer: Better, Faster, Older.

Language: Python - Size: 13.7 KB - Last synced at: 12 days ago - Pushed at: 8 months ago - Stars: 183 - Forks: 10

vaila-multimodaltoolbox/vaila

https://vaila.readthedocs.io/

Language: Python - Size: 509 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 8 - Forks: 2

oele-isis-vanderbilt/SyncFlow

Harmonize Your Data Streams

Language: TypeScript - Size: 5.18 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 3 - Forks: 0

video-db/videodb-cookbook

Build use cases with VideoDB

Language: Jupyter Notebook - Size: 15.3 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 21 - Forks: 3

westlake-repl/NineRec

Multimodal Dataset and Benchmark for Multi-domain and Cross-domain Recommendation System

Language: Python - Size: 13.4 MB - Last synced at: 5 days ago - Pushed at: 7 months ago - Stars: 92 - Forks: 7

autodistill/autodistill

Images to inference with no labeling (use foundation models to train supervised models).

Language: Python - Size: 1.14 MB - Last synced at: 17 days ago - Pushed at: about 2 months ago - Stars: 2,230 - Forks: 183

mahmoodlab/MMP

Multimodal prototyping for cancer survival prediction - ICML 2024

Language: Jupyter Notebook - Size: 117 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 82 - Forks: 9

luckercs/multimodal-search

Multimodal search, supports searching for images through text and images

Language: Vue - Size: 412 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

MICA-MNI/micaopen

Open Scripts and pipelines from the Multimodal Imaging and Connectome Analysis Lab at the Montreal Neurological Institute

Language: Jupyter Notebook - Size: 1.7 GB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 75 - Forks: 40

AMD-AIG-AIMA/gpt-fast

The GPT-Fast for Multimodal Models on AMD GPUs

Language: Python - Size: 6.03 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 2 - Forks: 0

Related Keywords
multimodal 1,032 llm 164 deep-learning 153 machine-learning 105 ai 97 pytorch 91 computer-vision 87 multimodal-deep-learning 86 large-language-models 77 nlp 70 artificial-intelligence 60 python 58 chatbot 58 clip 55 generative-ai 54 dataset 49 rag 47 multimodal-learning 45 transformers 44 multimodal-large-language-models 40 vision-language-model 39 llama 39 natural-language-processing 38 llava 38 chatgpt 37 gpt4 36 openai 35 llms 33 foundation-models 33 transformer 32 multimodality 31 gpt-4 28 gpt 28 stable-diffusion 27 vlm 25 vision-and-language 25 huggingface 25 mllm 24 benchmark 24 agent 24 vision-transformer 23 vision 22 embeddings 22 video 22 vqa 22 text-to-image 21 vision-language 21 agents 21 gemini 20 neural-network 19 bert 19 instruction-tuning 19 large-multimodal-models 18 image-captioning 18 langchain 18 attention 18 attention-mechanism 17 robotics 16 streamlit 16 knowledge-graph 16 language-model 15 speech-recognition 15 text-to-speech 15 image-generation 15 attention-is-all-you-need 15 reasoning 15 reinforcement-learning 14 contrastive-learning 14 awesome 14 image 14 diffusion-models 14 conversational-ai 14 vector-database 13 voice-assistant 13 tts 13 awesome-list 13 classification 13 prompt-engineering 13 object-detection 13 image-classification 13 tensorflow 13 diffusion 12 multi-modality 12 docker 12 multilingual 12 deeplearning 12 image-processing 11 multi-modal 11 gradio 11 evaluation 11 deep-neural-networks 11 ollama 11 mlops 11 aws 10 recommender-system 10 retrieval-augmented-generation 10 data-science 10 llama3 10 huggingface-transformers 10 prompt 10