An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: multimodal-deep-learning

maastrichtlawtech/MATCHED

Multimodal Authorship-Attribution To Combat Human Trafficking in Escort-Advertisement Data

Language: Jupyter Notebook - Size: 4.5 MB - Last synced at: about 16 hours ago - Pushed at: about 17 hours ago - Stars: 1 - Forks: 0

GerrySant/multimodalhugs

MultimodalHugs is an extension of Hugging Face that offers a generalized framework for training, evaluating, and using multimodal AI models with minimal code differences, ensuring seamless compatibility with Hugging Face pipelines.

Language: Python - Size: 4.38 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 5 - Forks: 2

AI4Finance-Foundation/FinRobot

FinRobot: An Open-Source AI Agent Platform for Financial Analysis using LLMs ๐Ÿš€ ๐Ÿš€ ๐Ÿš€

Language: Jupyter Notebook - Size: 7.4 MB - Last synced at: 1 day ago - Pushed at: 7 months ago - Stars: 3,626 - Forks: 634

khiemducdoan/MyBachelorThesis

This is my project for my Bachelor Thesis.

Language: Jupyter Notebook - Size: 30.8 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

omeregev/click2mask

[AAAI 2025] Official Implementation for "Click2Mask: Local Editing with Dynamic Mask Generation" Paper.

Language: Python - Size: 62.8 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 17 - Forks: 2

willxxy/ECG-Bench

A Unified Framework for Benchmarking Generative Electrocardiogram-Language Models (ELMs)

Language: Python - Size: 6.81 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 15 - Forks: 2

Adam-maz/MultiModal-fCNN-Classifier

Here we provide fCNN, multimodal small drug screening toolkit based on Morgan fingerprints and images of simulated (Docking, DFT) molecules.

Language: Jupyter Notebook - Size: 2.37 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

akusayudodograu/Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Multimodal Agentic GenAI Workflow โ€“ Seamlessly blends retrieval and generation for intelligent storytelling

Size: 1000 Bytes - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 9 - Forks: 1

automatika-robotics/roboml

RoboML is an aggregator package written for prototyping and deploying open source ML models for robotics use cases

Language: Python - Size: 906 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 7 - Forks: 0

arielsiman-tov/Weather-Image-Classification-for-Extreme-Weather-Events Fork of TalKleinBgu/Weather_Image_Classification_for_Extreme_Weather_Events

Build a reliable and interpretable model that classifies extreme weather from images โ€“ enhancing early detection, situational awareness, and decision-making.

Language: Jupyter Notebook - Size: 5.19 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

willxxy/awesome-mmps

Corpus of resources for multimodal machine learning with physiological signals (mmps).

Size: 340 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 88 - Forks: 2

friedrichor/Awesome-Multimodal-Papers

A curated list of awesome Multimodal studies.

Size: 63.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 207 - Forks: 18

floriankulig/neural-navi

Driver CoPilot as a student research project. Using multimodal data-input-streams from a cars telemetry and camera data to try to predict what would be the best drivers' manouver.

Language: Python - Size: 122 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

bezirganyan/DBF_uncertainty

Original PyTorch implementation of AIStats 2025 paper: Multimodal Learning with Uncertainty Quantification based on Discounted Belief Fusion

Language: Python - Size: 20.1 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 2 - Forks: 0

marslanm/Multimodality-Representation-Learning

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .

Size: 63.3 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 75 - Forks: 7

MILVLG/prophet

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

Language: Python - Size: 2.43 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 275 - Forks: 28

sylvanding/MCD-UNet

MCD-UNet: A Multi-modal Conditional Diffusion UNet for 3D Medical Image Segmentation

Language: Python - Size: 396 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

jrzaurin/pytorch-widedeep

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

Language: Python - Size: 99.6 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 1,356 - Forks: 193

saky-semicolon/Multimodal-Brain-Tumor-Segmentation

This project presents a deep learning-based solution for brain tumor segmentation using multimodal MRI scans and U-Net architecture.

Language: Jupyter Notebook - Size: 28.2 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

remyxai/VQASynth

Compose multimodal datasets ๐ŸŽน

Language: Python - Size: 17.5 MB - Last synced at: 8 days ago - Pushed at: 10 days ago - Stars: 403 - Forks: 17

reascr/Multimodal_Painter_Attribution

Integrating visual and textual features for improved classification performance in painter attribution of fine art paintings.

Language: Jupyter Notebook - Size: 409 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

DWCTOD/CVPR2024-Papers-with-Code-Demo

ๆ”ถ้›† CVPR ๆœ€ๆ–ฐ็š„ๆˆๆžœ๏ผŒๅŒ…ๆ‹ฌ่ฎบๆ–‡ใ€ไปฃ็ ๅ’Œdemo่ง†้ข‘็ญ‰๏ผŒๆฌข่ฟŽๅคงๅฎถๆŽจ่๏ผCollect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!

Size: 137 KB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 1,367 - Forks: 154

multimindlab/multimind-sdk

Your SDK solves all of this. One interface. Unified logic. Local + hosted models. Fine-tuning. Agent tools. Enterprise-ready. Hybrid RAG.Star ๐ŸŒŸ if you like it!

Language: Python - Size: 46.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 15 - Forks: 1

zhu-xlab/DOFA

Code for Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities

Language: Jupyter Notebook - Size: 993 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 135 - Forks: 12

AlibabaResearch/AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Language: C++ - Size: 104 MB - Last synced at: 12 days ago - Pushed at: 2 months ago - Stars: 1,727 - Forks: 194

KIETOU1/sentimental-analysis

# Sentiment Analysis Web ApplicationThis web application analyzes text sentiment using TextBlob and features a sleek UI built with Next.js and Flask. Explore real-time insights and visualizations to understand sentiment trends easily. ๐Ÿ™๐ŸŒ

Language: TypeScript - Size: 794 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

HanesSue/Multimodal_emotional_analysis

Language: Python - Size: 1.8 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 1 - Forks: 1

willxxy/ECG-Byte

[arxiv 2024] ECG-Byte: A Tokenizer for End-to-End Generative Electrocardiogram Language Modeling

Language: Python - Size: 28.5 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 18 - Forks: 0

ashutosh1919/data2vec-pytorch

Ready to run PyTorch implementation of Data2Vec 2.0: Highly efficient self-supervised representation learning for vision, speech and text.

Language: Python - Size: 116 KB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 3

IDEA-Research/ChatRex

Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

Language: Python - Size: 8.82 MB - Last synced at: 15 days ago - Pushed at: 5 months ago - Stars: 185 - Forks: 7

Pradeep9167/Spatial-MLLM

Spatial-MLLM enhances multi-language learning models by integrating visual-based spatial intelligence. This project aims to improve understanding and processing of spatial data, making it a valuable resource for researchers and developers. ๐ŸŒ๐Ÿš€

Language: Python - Size: 18.4 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

richard-peng-xia/awesome-multimodal-in-medical-imaging

A collection of resources on applications of multi-modal learning in medical imaging.

Size: 309 KB - Last synced at: 17 days ago - Pushed at: 18 days ago - Stars: 753 - Forks: 68

kyegomez/MultiModalCrossAttn

The open source implementation of the cross attention mechanism from the paper: "JOINTLY TRAINING LARGE AUTOREGRESSIVE MULTIMODAL MODELS"

Language: Python - Size: 223 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 28 - Forks: 1

kyegomez/MMCA-MGQA

Experiments around using Multi-Modal Casual Attention with Multi-Grouped Query Attention

Language: Python - Size: 210 KB - Last synced at: 14 days ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 0

kyegomez/Odin

SOTA Classification at scale for UAVs, Drones, and much more

Language: Python - Size: 211 KB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 0

kyegomez/PALI

Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"

Language: Python - Size: 624 KB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 90 - Forks: 8

thuiar/MMSA-FET

A Tool for extracting multimodal features from videos.

Language: Python - Size: 24.4 MB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 171 - Forks: 23

Yutong-Zhou-cv/Awesome-Multimodality

A Survey on multimodal learning research.

Size: 1.76 MB - Last synced at: 8 days ago - Pushed at: almost 2 years ago - Stars: 327 - Forks: 22

kyegomez/swarms-pytorch

Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch ๐Ÿ˜Š

Language: Python - Size: 58.2 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 121 - Forks: 10

JerryX1110/awesome-rvos

Referring Video Object Segmentation / Multi-Object Tracking Repo

Language: Python - Size: 79.1 KB - Last synced at: 4 days ago - Pushed at: almost 2 years ago - Stars: 87 - Forks: 4

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " ๐Ÿฆ™ Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"

Language: Python - Size: 1.06 MB - Last synced at: 23 days ago - Pushed at: 8 months ago - Stars: 2,033 - Forks: 353

abhishekjoshi007/A-Multi-Modal-Transformer-Architecture-Combining-Sentiment-Dynamics-Temporal-Market-Data

Our approach uniquely fuses sentiment dynamics from social media and news sources with temporal market data and macroeconomic indicators to construct dynamic graph representations of interfirm relationships. Further, we employ state-of-the-art GNNs, such as temporal graph convolutions, that adapt to the changing market and significantly enhance it.

Language: Python - Size: 10.7 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

geoaigroup/awesome-vision-language-models-for-earth-observation

A curated list of awesome vision and language resources for earth observation.

Size: 470 KB - Last synced at: 8 days ago - Pushed at: 3 months ago - Stars: 228 - Forks: 17

phellonchen/awesome-Vision-and-Language-Pre-training

Recent Advances in Vision and Language Pre-training (VLP)

Size: 81.1 KB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 293 - Forks: 16

declare-lab/multimodal-deep-learning

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Language: OpenEdge ABL - Size: 181 MB - Last synced at: 22 days ago - Pushed at: over 2 years ago - Stars: 840 - Forks: 155

kyegomez/Med-PaLM

Towards Generalist Biomedical AI

Language: Python - Size: 850 KB - Last synced at: 26 days ago - Pushed at: over 1 year ago - Stars: 383 - Forks: 53

multimindlab/.github

Size: 0 Bytes - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

fraunhoferhhi/spvloc

[ECCV 2024 Oral] SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments

Language: Python - Size: 3.58 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 31 - Forks: 2

stevejpapad/miscaptioned-image-reconstruction

Repository for the "Latent Multimodal Reconstruction for Misinformation Detection" paper

Language: Python - Size: 402 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 1 - Forks: 0

theislab/scarches

Reference mapping for single-cell genomics

Language: Jupyter Notebook - Size: 825 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 365 - Forks: 59

Rajeevveera24/LatentAlignmentProcedural

This repository is cloned from https://github.com/HLR/LatentAlignmentProcedural. This is a potential baseline explored for the textual_cloze task on the RecipeQA Dataset - https://hucvl.github.io/recipeqa/

Language: Jupyter Notebook - Size: 47 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ForestsKing/Awesome-Multimodal-Time-Series

A curated list of paper, code, data, and other resources focus on multimodal time series analysis.

Size: 10.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 83 - Forks: 5

GangGreenTemperTatum/toronto-visual-ai-hackathon-2025

Visual AI Hackathon Project

Language: Jupyter Notebook - Size: 14.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 1

drprojects/DeepViewAgg

[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"

Language: Python - Size: 302 MB - Last synced at: 26 days ago - Pushed at: 11 months ago - Stars: 230 - Forks: 25

declare-lab/LLM-PuzzleTest

This repository is maintained to release dataset and models for multimodal puzzle reasoning.

Language: Python - Size: 131 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 87 - Forks: 7

MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Language: Python - Size: 186 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 431 - Forks: 35

kyegomez/NaViT

My implementation of "Patch nโ€™ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

Language: Python - Size: 210 KB - Last synced at: 29 days ago - Pushed at: 3 months ago - Stars: 232 - Forks: 11

salesforce/LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language: Jupyter Notebook - Size: 79.3 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 10,558 - Forks: 1,031

kyegomez/BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Language: Python - Size: 36.5 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 1,822 - Forks: 162

kyegomez/PALI3

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

Language: Python - Size: 2.61 MB - Last synced at: 28 days ago - Pushed at: 3 months ago - Stars: 145 - Forks: 4

kyegomez/Kosmos2.5

My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"

Language: Python - Size: 231 KB - Last synced at: 19 days ago - Pushed at: 3 months ago - Stars: 72 - Forks: 6

kyegomez/the-compiler

Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!

Language: Python - Size: 10.4 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 144 - Forks: 16

kyegomez/Pegasus

PegasusX: The Future of Multimodal Embeddings ๐Ÿฆ„ ๐Ÿฆ„

Language: Python - Size: 37.5 MB - Last synced at: 28 days ago - Pushed at: 8 months ago - Stars: 14 - Forks: 5

burhanahmed1/CryptoSynth

Bitcoin Sentiment Forecast is a Multimodal approach to Bitcoin price forecasting using NLP and Time Series Analysis

Language: Jupyter Notebook - Size: 3.54 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Yutong-Zhou-cv/Awesome-Text-to-Image

(เท†`๊’ณยดเท†) A Survey on Text-to-Image Generation/Synthesis.

Size: 69.2 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 2,339 - Forks: 200

TheShadow29/awesome-grounding

awesome grounding: A curated list of research papers in visual grounding

Size: 172 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 1,073 - Forks: 101

WinfredGe/T2S

[IJCAI 2025] Official implementation of "T2S: High-resolution Time Series Generation with Text-to-Series Diffusion Models"

Language: Python - Size: 37.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 10 - Forks: 1

FIVEYOUNGWOO/WiFiMobNet

WiFi-Camera multimodal learning-based object detection and pose estimation.

Language: Python - Size: 560 MB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 3 - Forks: 1

rahul-jaiswar-git/Toxic-Content-Analyzer-with-Perspective-API

A modern, multi-modal hate speech detection web app using the Perspective API. Analyze text, images, audio, and video for toxic or harmful content in a user-friendly interface.

Language: Python - Size: 18.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

yuanze-lin/Learnable_Regions

[CVPR 2024] Official code for "Text-Driven Image Editing via Learnable Regions"

Language: Python - Size: 11.5 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 224 - Forks: 21

Mrkomiljon/awesome-generative-ai

Multimodal generative AI resources : talking heads, STT, TTS, image & video generation, and more.

Size: 2.31 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 1

Devanshpandey/preCog-Multimodal-AI-for-Precision-Cardiology

Code used training preCog

Language: Jupyter Notebook - Size: 12.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

nicolay-r/nicolay-r

This is my personal news list updates in Information Retrieval domain

Size: 244 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Duplums/CoMM

[ICLR 2025] Multi-modal representation learning of shared, unique and synergistic features between modalities

Language: Python - Size: 2.94 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 11 - Forks: 4

aimotive/mm_training

Multimodal model training on aiMotive Dataset

Language: Python - Size: 2.87 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 16 - Forks: 4

mahmoodlab/MCAT

Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021

Language: Jupyter Notebook - Size: 540 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 200 - Forks: 40

thuiar/MIntRec

MIntRec: A New Dataset for Multimodal Intent Recognition (ACM MM 2022)

Language: Python - Size: 1.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 88 - Forks: 15

MIDA-group/CoMIR_INSPIRE

Framework for Multimodal Deformable Image Registration. Coordinated equivariant representation learning (CoMIR) combined with robust deformable registration by INSPIRE.

Language: Python - Size: 9.9 MB - Last synced at: about 9 hours ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 1

samsad35/VQ-MAE-AudioVisual-code

A vector quantized masked autoencoder for audiovisual speech emotion recognition

Language: Python - Size: 21.7 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

boemer00/cooper-mvp

Weโ€™re building the emotional intelligence layer for all marketing decisions in a multimodal world.

Language: Python - Size: 36.1 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

pxxpassi/Allurelle-Skincare-Recommender-App

Recommending users with products based on image processing and external factors to build an inclusive selfcare community

Language: Dart - Size: 34.8 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

ParitoshParmar/Piano-Skills-Assessment

Piano Skills Assessment [IEEE MMSP 2021]

Language: Python - Size: 854 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 17 - Forks: 2

sail-sg/CLoT

CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".

Language: Python - Size: 6.46 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 311 - Forks: 16

frankaging/Multimodal-Transformer

Attention Based Multi-modal Emotion Recognition; Stanford Emotional Narratives Dataset

Language: Python - Size: 458 MB - Last synced at: about 2 months ago - Pushed at: almost 6 years ago - Stars: 18 - Forks: 1

canary-for-cognition/multimodal-dl-framework

An extensible PyTorch framework to experiment with neural-networks-based deep learning algorithms on multiple data modalities for binary classification.

Language: Python - Size: 2.22 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 3

ilaria-manco/multimodal-ml-music

List of academic resources on Multimodal ML for Music

Language: TeX - Size: 268 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 295 - Forks: 11

xmarva/transformer-based-architectures

Breakdown of SoTA transformer-based architectures

Language: Jupyter Notebook - Size: 741 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

declare-lab/awesome-emotion-recognition-in-conversations

A comprehensive reading list for Emotion Recognition in Conversations

Size: 273 KB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 269 - Forks: 45

RaptorMai/MLLM-CompBench

[NeurIPS'25] MLLM-CompBench evaluates the comparative reasoning of MLLMs with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, state, emotion, temporality, spatiality, quantity, and quality. CompBench covers diverse visual domains, including animals, fashion, sports, and scenes

Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 38 - Forks: 2

westlake-repl/Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review

Paper List of Pre-trained Foundation Recommender Models

Size: 444 KB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 349 - Forks: 27

yuanze-lin/REVIVE

[NeurIPS 2022] Official code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

Language: Python - Size: 3.39 MB - Last synced at: 26 days ago - Pushed at: 3 months ago - Stars: 101 - Forks: 9

steve-zeyu-zhang/MotionAnything

๐Ÿ”ฅ Motion Anything: Any to Motion Generation

Size: 183 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 156 - Forks: 2

ninibymilk/PMF-MMEA

[ACL2024] Progressively Modality Freezing for Multi-Modal Entity Alignment

Language: Python - Size: 551 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 16 - Forks: 0

RunyuFan/UisNet-TGRS-2022

Code for TGRS 2022 paper "Fine-scale Urban Informal Settlements Mapping by Fusing Remote Sensing Images and Building Data via a Transformer-based Multimodal Fusion Network"

Language: Python - Size: 142 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 10 - Forks: 1

fcakyon/content-moderation-deep-learning

Deep learning based content moderation from text, audio, video & image input modalities.

Size: 188 KB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 343 - Forks: 20

stevejpapad/relevant-evidence-detection

Official repository for the "RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection" paper.

Language: Python - Size: 40.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 3 - Forks: 2

PAIR-Systems-Inc/little-dorrit-editor

Multimodal benchmark for evaluating handwritten editorial correction in printed text.

Language: Python - Size: 13.9 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

rizavelioglu/hateful_memes-hate_detectron

Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxiv.org/abs/2012.12975

Language: Jupyter Notebook - Size: 11.2 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 59 - Forks: 18

shamanez/Self-Supervised-Embedding-Fusion-Transformer

The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.

Language: Python - Size: 4.65 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 119 - Forks: 22

om-ai-lab/VL-CheckList

Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]

Language: Python - Size: 26.6 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 129 - Forks: 5

Related Keywords
multimodal-deep-learning 429 deep-learning 110 multimodal 88 pytorch 67 computer-vision 58 machine-learning 50 multimodal-learning 39 natural-language-processing 26 nlp 25 multimodality 23 vision-and-language 22 python 21 tensorflow 20 large-language-models 20 llm 18 transformers 16 attention-mechanism 16 transformer 16 deep-neural-networks 15 artificial-intelligence 15 multimodal-large-language-models 14 multimodal-sentiment-analysis 14 generative-ai 14 classification 13 gpt4 13 self-supervised-learning 13 emotion-recognition 12 dataset 12 neural-network 11 ai 11 convolutional-neural-networks 11 attention 11 image-classification 11 visual-question-answering 11 clip 10 multimodal-datasets 10 image-processing 10 language-model 9 diffusion-models 9 image 9 time-series 9 cnn 9 object-detection 8 bert 8 vision-transformer 8 vision-language-model 8 vision-language-transformer 8 awesome-list 8 sentiment-analysis 8 multimodal-representation 7 vision-language 7 representation-learning 7 keras 7 multimodal-fusion 7 image-captioning 7 multimodal-data 7 pytorch-lightning 7 deeplearning 7 foundation-models 6 text-classification 6 embeddings 6 vision-language-pretraining 6 remote-sensing 6 reinforcement-learning 6 graph-neural-networks 6 huggingface-transformers 6 transfer-learning 6 3d 6 neural-networks 6 lstm 6 text-to-image 6 transformer-models 5 gan 5 text 5 paper 5 generative-adversarial-network 5 memes 5 speech-recognition 5 data-fusion 5 python3 5 question-answering 5 contrastive-learning 5 semantic-segmentation 5 point-cloud 5 recommender-system 5 attention-is-all-you-need 5 vqa 5 visual-grounding 5 nlp-machine-learning 5 multimodal-interactions 5 anomaly-detection 5 generative-model 5 variational-autoencoder 5 audio 5 large-multimodal-models 5 image-generation 5 feature-engineering 5 audio-processing 5 hateful-memes-challenge 4 multimodal-emotion-recognition 4