GitHub topics: multimodal-deep-learning
maastrichtlawtech/MATCHED
Multimodal Authorship-Attribution To Combat Human Trafficking in Escort-Advertisement Data
Language: Jupyter Notebook - Size: 4.5 MB - Last synced at: about 16 hours ago - Pushed at: about 17 hours ago - Stars: 1 - Forks: 0

GerrySant/multimodalhugs
MultimodalHugs is an extension of Hugging Face that offers a generalized framework for training, evaluating, and using multimodal AI models with minimal code differences, ensuring seamless compatibility with Hugging Face pipelines.
Language: Python - Size: 4.38 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 5 - Forks: 2

AI4Finance-Foundation/FinRobot
FinRobot: An Open-Source AI Agent Platform for Financial Analysis using LLMs ๐ ๐ ๐
Language: Jupyter Notebook - Size: 7.4 MB - Last synced at: 1 day ago - Pushed at: 7 months ago - Stars: 3,626 - Forks: 634

khiemducdoan/MyBachelorThesis
This is my project for my Bachelor Thesis.
Language: Jupyter Notebook - Size: 30.8 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

omeregev/click2mask
[AAAI 2025] Official Implementation for "Click2Mask: Local Editing with Dynamic Mask Generation" Paper.
Language: Python - Size: 62.8 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 17 - Forks: 2

willxxy/ECG-Bench
A Unified Framework for Benchmarking Generative Electrocardiogram-Language Models (ELMs)
Language: Python - Size: 6.81 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 15 - Forks: 2

Adam-maz/MultiModal-fCNN-Classifier
Here we provide fCNN, multimodal small drug screening toolkit based on Morgan fingerprints and images of simulated (Docking, DFT) molecules.
Language: Jupyter Notebook - Size: 2.37 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

akusayudodograu/Agentic-RAG-Story-Generation-with-Multimodal-GenAI
Multimodal Agentic GenAI Workflow โ Seamlessly blends retrieval and generation for intelligent storytelling
Size: 1000 Bytes - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 9 - Forks: 1

automatika-robotics/roboml
RoboML is an aggregator package written for prototyping and deploying open source ML models for robotics use cases
Language: Python - Size: 906 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 7 - Forks: 0

arielsiman-tov/Weather-Image-Classification-for-Extreme-Weather-Events Fork of TalKleinBgu/Weather_Image_Classification_for_Extreme_Weather_Events
Build a reliable and interpretable model that classifies extreme weather from images โ enhancing early detection, situational awareness, and decision-making.
Language: Jupyter Notebook - Size: 5.19 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

willxxy/awesome-mmps
Corpus of resources for multimodal machine learning with physiological signals (mmps).
Size: 340 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 88 - Forks: 2

friedrichor/Awesome-Multimodal-Papers
A curated list of awesome Multimodal studies.
Size: 63.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 207 - Forks: 18

floriankulig/neural-navi
Driver CoPilot as a student research project. Using multimodal data-input-streams from a cars telemetry and camera data to try to predict what would be the best drivers' manouver.
Language: Python - Size: 122 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

bezirganyan/DBF_uncertainty
Original PyTorch implementation of AIStats 2025 paper: Multimodal Learning with Uncertainty Quantification based on Discounted Belief Fusion
Language: Python - Size: 20.1 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 2 - Forks: 0

marslanm/Multimodality-Representation-Learning
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
Size: 63.3 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 75 - Forks: 7

MILVLG/prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
Language: Python - Size: 2.43 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 275 - Forks: 28

sylvanding/MCD-UNet
MCD-UNet: A Multi-modal Conditional Diffusion UNet for 3D Medical Image Segmentation
Language: Python - Size: 396 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

jrzaurin/pytorch-widedeep
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
Language: Python - Size: 99.6 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 1,356 - Forks: 193

saky-semicolon/Multimodal-Brain-Tumor-Segmentation
This project presents a deep learning-based solution for brain tumor segmentation using multimodal MRI scans and U-Net architecture.
Language: Jupyter Notebook - Size: 28.2 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

remyxai/VQASynth
Compose multimodal datasets ๐น
Language: Python - Size: 17.5 MB - Last synced at: 8 days ago - Pushed at: 10 days ago - Stars: 403 - Forks: 17

reascr/Multimodal_Painter_Attribution
Integrating visual and textual features for improved classification performance in painter attribution of fine art paintings.
Language: Jupyter Notebook - Size: 409 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

DWCTOD/CVPR2024-Papers-with-Code-Demo
ๆถ้ CVPR ๆๆฐ็ๆๆ๏ผๅ ๆฌ่ฎบๆใไปฃ็ ๅdemo่ง้ข็ญ๏ผๆฌข่ฟๅคงๅฎถๆจ่๏ผCollect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!
Size: 137 KB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 1,367 - Forks: 154

multimindlab/multimind-sdk
Your SDK solves all of this. One interface. Unified logic. Local + hosted models. Fine-tuning. Agent tools. Enterprise-ready. Hybrid RAG.Star ๐ if you like it!
Language: Python - Size: 46.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 15 - Forks: 1

zhu-xlab/DOFA
Code for Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities
Language: Jupyter Notebook - Size: 993 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 135 - Forks: 12

AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Language: C++ - Size: 104 MB - Last synced at: 12 days ago - Pushed at: 2 months ago - Stars: 1,727 - Forks: 194

KIETOU1/sentimental-analysis
# Sentiment Analysis Web ApplicationThis web application analyzes text sentiment using TextBlob and features a sleek UI built with Next.js and Flask. Explore real-time insights and visualizations to understand sentiment trends easily. ๐๐
Language: TypeScript - Size: 794 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

HanesSue/Multimodal_emotional_analysis
Language: Python - Size: 1.8 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 1 - Forks: 1

willxxy/ECG-Byte
[arxiv 2024] ECG-Byte: A Tokenizer for End-to-End Generative Electrocardiogram Language Modeling
Language: Python - Size: 28.5 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 18 - Forks: 0

ashutosh1919/data2vec-pytorch
Ready to run PyTorch implementation of Data2Vec 2.0: Highly efficient self-supervised representation learning for vision, speech and text.
Language: Python - Size: 116 KB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 3

IDEA-Research/ChatRex
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Language: Python - Size: 8.82 MB - Last synced at: 15 days ago - Pushed at: 5 months ago - Stars: 185 - Forks: 7

Pradeep9167/Spatial-MLLM
Spatial-MLLM enhances multi-language learning models by integrating visual-based spatial intelligence. This project aims to improve understanding and processing of spatial data, making it a valuable resource for researchers and developers. ๐๐
Language: Python - Size: 18.4 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

richard-peng-xia/awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
Size: 309 KB - Last synced at: 17 days ago - Pushed at: 18 days ago - Stars: 753 - Forks: 68

kyegomez/MultiModalCrossAttn
The open source implementation of the cross attention mechanism from the paper: "JOINTLY TRAINING LARGE AUTOREGRESSIVE MULTIMODAL MODELS"
Language: Python - Size: 223 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 28 - Forks: 1

kyegomez/MMCA-MGQA
Experiments around using Multi-Modal Casual Attention with Multi-Grouped Query Attention
Language: Python - Size: 210 KB - Last synced at: 14 days ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 0

kyegomez/Odin
SOTA Classification at scale for UAVs, Drones, and much more
Language: Python - Size: 211 KB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 0

kyegomez/PALI
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
Language: Python - Size: 624 KB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 90 - Forks: 8

thuiar/MMSA-FET
A Tool for extracting multimodal features from videos.
Language: Python - Size: 24.4 MB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 171 - Forks: 23

Yutong-Zhou-cv/Awesome-Multimodality
A Survey on multimodal learning research.
Size: 1.76 MB - Last synced at: 8 days ago - Pushed at: almost 2 years ago - Stars: 327 - Forks: 22

kyegomez/swarms-pytorch
Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch ๐
Language: Python - Size: 58.2 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 121 - Forks: 10

JerryX1110/awesome-rvos
Referring Video Object Segmentation / Multi-Object Tracking Repo
Language: Python - Size: 79.1 KB - Last synced at: 4 days ago - Pushed at: almost 2 years ago - Stars: 87 - Forks: 4

KimMeen/Time-LLM
[ICLR 2024] Official implementation of " ๐ฆ Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
Language: Python - Size: 1.06 MB - Last synced at: 23 days ago - Pushed at: 8 months ago - Stars: 2,033 - Forks: 353

abhishekjoshi007/A-Multi-Modal-Transformer-Architecture-Combining-Sentiment-Dynamics-Temporal-Market-Data
Our approach uniquely fuses sentiment dynamics from social media and news sources with temporal market data and macroeconomic indicators to construct dynamic graph representations of interfirm relationships. Further, we employ state-of-the-art GNNs, such as temporal graph convolutions, that adapt to the changing market and significantly enhance it.
Language: Python - Size: 10.7 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

geoaigroup/awesome-vision-language-models-for-earth-observation
A curated list of awesome vision and language resources for earth observation.
Size: 470 KB - Last synced at: 8 days ago - Pushed at: 3 months ago - Stars: 228 - Forks: 17

phellonchen/awesome-Vision-and-Language-Pre-training
Recent Advances in Vision and Language Pre-training (VLP)
Size: 81.1 KB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 293 - Forks: 16

declare-lab/multimodal-deep-learning
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
Language: OpenEdge ABL - Size: 181 MB - Last synced at: 22 days ago - Pushed at: over 2 years ago - Stars: 840 - Forks: 155

kyegomez/Med-PaLM
Towards Generalist Biomedical AI
Language: Python - Size: 850 KB - Last synced at: 26 days ago - Pushed at: over 1 year ago - Stars: 383 - Forks: 53

multimindlab/.github
Size: 0 Bytes - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

fraunhoferhhi/spvloc
[ECCV 2024 Oral] SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments
Language: Python - Size: 3.58 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 31 - Forks: 2

stevejpapad/miscaptioned-image-reconstruction
Repository for the "Latent Multimodal Reconstruction for Misinformation Detection" paper
Language: Python - Size: 402 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 1 - Forks: 0

theislab/scarches
Reference mapping for single-cell genomics
Language: Jupyter Notebook - Size: 825 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 365 - Forks: 59

Rajeevveera24/LatentAlignmentProcedural
This repository is cloned from https://github.com/HLR/LatentAlignmentProcedural. This is a potential baseline explored for the textual_cloze task on the RecipeQA Dataset - https://hucvl.github.io/recipeqa/
Language: Jupyter Notebook - Size: 47 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ForestsKing/Awesome-Multimodal-Time-Series
A curated list of paper, code, data, and other resources focus on multimodal time series analysis.
Size: 10.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 83 - Forks: 5

GangGreenTemperTatum/toronto-visual-ai-hackathon-2025
Visual AI Hackathon Project
Language: Jupyter Notebook - Size: 14.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 1

drprojects/DeepViewAgg
[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
Language: Python - Size: 302 MB - Last synced at: 26 days ago - Pushed at: 11 months ago - Stars: 230 - Forks: 25

declare-lab/LLM-PuzzleTest
This repository is maintained to release dataset and models for multimodal puzzle reasoning.
Language: Python - Size: 131 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 87 - Forks: 7

MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Language: Python - Size: 186 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 431 - Forks: 35

kyegomez/NaViT
My implementation of "Patch nโ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
Language: Python - Size: 210 KB - Last synced at: 29 days ago - Pushed at: 3 months ago - Stars: 232 - Forks: 11

salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language: Jupyter Notebook - Size: 79.3 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 10,558 - Forks: 1,031

kyegomez/BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
Language: Python - Size: 36.5 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 1,822 - Forks: 162

kyegomez/PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
Language: Python - Size: 2.61 MB - Last synced at: 28 days ago - Pushed at: 3 months ago - Stars: 145 - Forks: 4

kyegomez/Kosmos2.5
My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"
Language: Python - Size: 231 KB - Last synced at: 19 days ago - Pushed at: 3 months ago - Stars: 72 - Forks: 6

kyegomez/the-compiler
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
Language: Python - Size: 10.4 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 144 - Forks: 16

kyegomez/Pegasus
PegasusX: The Future of Multimodal Embeddings ๐ฆ ๐ฆ
Language: Python - Size: 37.5 MB - Last synced at: 28 days ago - Pushed at: 8 months ago - Stars: 14 - Forks: 5

burhanahmed1/CryptoSynth
Bitcoin Sentiment Forecast is a Multimodal approach to Bitcoin price forecasting using NLP and Time Series Analysis
Language: Jupyter Notebook - Size: 3.54 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Yutong-Zhou-cv/Awesome-Text-to-Image
(เท`๊ณยดเท) A Survey on Text-to-Image Generation/Synthesis.
Size: 69.2 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 2,339 - Forks: 200

TheShadow29/awesome-grounding
awesome grounding: A curated list of research papers in visual grounding
Size: 172 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 1,073 - Forks: 101

WinfredGe/T2S
[IJCAI 2025] Official implementation of "T2S: High-resolution Time Series Generation with Text-to-Series Diffusion Models"
Language: Python - Size: 37.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 10 - Forks: 1

FIVEYOUNGWOO/WiFiMobNet
WiFi-Camera multimodal learning-based object detection and pose estimation.
Language: Python - Size: 560 MB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 3 - Forks: 1

rahul-jaiswar-git/Toxic-Content-Analyzer-with-Perspective-API
A modern, multi-modal hate speech detection web app using the Perspective API. Analyze text, images, audio, and video for toxic or harmful content in a user-friendly interface.
Language: Python - Size: 18.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

yuanze-lin/Learnable_Regions
[CVPR 2024] Official code for "Text-Driven Image Editing via Learnable Regions"
Language: Python - Size: 11.5 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 224 - Forks: 21

Mrkomiljon/awesome-generative-ai
Multimodal generative AI resources : talking heads, STT, TTS, image & video generation, and more.
Size: 2.31 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 1

Devanshpandey/preCog-Multimodal-AI-for-Precision-Cardiology
Code used training preCog
Language: Jupyter Notebook - Size: 12.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

nicolay-r/nicolay-r
This is my personal news list updates in Information Retrieval domain
Size: 244 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Duplums/CoMM
[ICLR 2025] Multi-modal representation learning of shared, unique and synergistic features between modalities
Language: Python - Size: 2.94 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 11 - Forks: 4

aimotive/mm_training
Multimodal model training on aiMotive Dataset
Language: Python - Size: 2.87 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 16 - Forks: 4

mahmoodlab/MCAT
Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021
Language: Jupyter Notebook - Size: 540 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 200 - Forks: 40

thuiar/MIntRec
MIntRec: A New Dataset for Multimodal Intent Recognition (ACM MM 2022)
Language: Python - Size: 1.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 88 - Forks: 15

MIDA-group/CoMIR_INSPIRE
Framework for Multimodal Deformable Image Registration. Coordinated equivariant representation learning (CoMIR) combined with robust deformable registration by INSPIRE.
Language: Python - Size: 9.9 MB - Last synced at: about 9 hours ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 1

samsad35/VQ-MAE-AudioVisual-code
A vector quantized masked autoencoder for audiovisual speech emotion recognition
Language: Python - Size: 21.7 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

boemer00/cooper-mvp
Weโre building the emotional intelligence layer for all marketing decisions in a multimodal world.
Language: Python - Size: 36.1 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

pxxpassi/Allurelle-Skincare-Recommender-App
Recommending users with products based on image processing and external factors to build an inclusive selfcare community
Language: Dart - Size: 34.8 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

ParitoshParmar/Piano-Skills-Assessment
Piano Skills Assessment [IEEE MMSP 2021]
Language: Python - Size: 854 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 17 - Forks: 2

sail-sg/CLoT
CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".
Language: Python - Size: 6.46 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 311 - Forks: 16

frankaging/Multimodal-Transformer
Attention Based Multi-modal Emotion Recognition; Stanford Emotional Narratives Dataset
Language: Python - Size: 458 MB - Last synced at: about 2 months ago - Pushed at: almost 6 years ago - Stars: 18 - Forks: 1

canary-for-cognition/multimodal-dl-framework
An extensible PyTorch framework to experiment with neural-networks-based deep learning algorithms on multiple data modalities for binary classification.
Language: Python - Size: 2.22 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 3

ilaria-manco/multimodal-ml-music
List of academic resources on Multimodal ML for Music
Language: TeX - Size: 268 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 295 - Forks: 11

xmarva/transformer-based-architectures
Breakdown of SoTA transformer-based architectures
Language: Jupyter Notebook - Size: 741 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

declare-lab/awesome-emotion-recognition-in-conversations
A comprehensive reading list for Emotion Recognition in Conversations
Size: 273 KB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 269 - Forks: 45

RaptorMai/MLLM-CompBench
[NeurIPS'25] MLLM-CompBench evaluates the comparative reasoning of MLLMs with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, state, emotion, temporality, spatiality, quantity, and quality. CompBench covers diverse visual domains, including animals, fashion, sports, and scenes
Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 38 - Forks: 2

westlake-repl/Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review
Paper List of Pre-trained Foundation Recommender Models
Size: 444 KB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 349 - Forks: 27

yuanze-lin/REVIVE
[NeurIPS 2022] Official code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Language: Python - Size: 3.39 MB - Last synced at: 26 days ago - Pushed at: 3 months ago - Stars: 101 - Forks: 9

steve-zeyu-zhang/MotionAnything
๐ฅ Motion Anything: Any to Motion Generation
Size: 183 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 156 - Forks: 2

ninibymilk/PMF-MMEA
[ACL2024] Progressively Modality Freezing for Multi-Modal Entity Alignment
Language: Python - Size: 551 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 16 - Forks: 0

RunyuFan/UisNet-TGRS-2022
Code for TGRS 2022 paper "Fine-scale Urban Informal Settlements Mapping by Fusing Remote Sensing Images and Building Data via a Transformer-based Multimodal Fusion Network"
Language: Python - Size: 142 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 10 - Forks: 1

fcakyon/content-moderation-deep-learning
Deep learning based content moderation from text, audio, video & image input modalities.
Size: 188 KB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 343 - Forks: 20

stevejpapad/relevant-evidence-detection
Official repository for the "RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection" paper.
Language: Python - Size: 40.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 3 - Forks: 2

PAIR-Systems-Inc/little-dorrit-editor
Multimodal benchmark for evaluating handwritten editorial correction in printed text.
Language: Python - Size: 13.9 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

rizavelioglu/hateful_memes-hate_detectron
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxiv.org/abs/2012.12975
Language: Jupyter Notebook - Size: 11.2 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 59 - Forks: 18

shamanez/Self-Supervised-Embedding-Fusion-Transformer
The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.
Language: Python - Size: 4.65 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 119 - Forks: 22

om-ai-lab/VL-CheckList
Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]
Language: Python - Size: 26.6 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 129 - Forks: 5
