GitHub topics: multimodal-deep-learning
rahul-jaiswar-git/Toxic-Content-Analyzer-with-Perspective-API
A modern, multi-modal hate speech detection web app using the Perspective API. Analyze text, images, audio, and video for toxic or harmful content in a user-friendly interface.
Language: Python - Size: 18.2 MB - Last synced at: about 5 hours ago - Pushed at: about 6 hours ago - Stars: 0 - Forks: 0

salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language: Jupyter Notebook - Size: 79.3 MB - Last synced at: about 18 hours ago - Pushed at: 6 months ago - Stars: 10,528 - Forks: 1,025

Mrkomiljon/awesome-generative-ai
Multimodal generative AI resources : talking heads, STT, TTS, image & video generation, and more.
Size: 2.31 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 1

Devanshpandey/preCog-Multimodal-AI-for-Precision-Cardiology
Code used training preCog
Language: Jupyter Notebook - Size: 12.7 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

remyxai/VQASynth
Compose multimodal datasets 🎹
Language: Python - Size: 17.5 MB - Last synced at: 2 days ago - Pushed at: 19 days ago - Stars: 365 - Forks: 14

ForestsKing/Awesome-Multimodal-Time-Series
A curated list of paper, code, data, and other resources focus on multimodal time series analysis.
Size: 9.77 KB - Last synced at: 3 days ago - Pushed at: 9 days ago - Stars: 71 - Forks: 4

nicolay-r/nicolay-r
This is my personal news list updates in Information Retrieval domain
Size: 244 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

Duplums/CoMM
[ICLR 2025] Multi-modal representation learning of shared, unique and synergistic features between modalities
Language: Python - Size: 2.94 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 11 - Forks: 4

akusayudodograu/Agentic-RAG-Story-Generation-with-Multimodal-GenAI
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
Size: 1000 Bytes - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 8 - Forks: 1

aimotive/mm_training
Multimodal model training on aiMotive Dataset
Language: Python - Size: 2.87 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 16 - Forks: 4

richard-peng-xia/awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
Size: 234 KB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 730 - Forks: 65

floriankulig/neural-navi
Driver CoPilot as a student research project. Using multimodal data-input-streams from a cars telemetry and camera data to try to predict what would be the best drivers' manouver.
Language: Python - Size: 56.8 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

mahmoodlab/MCAT
Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021
Language: Jupyter Notebook - Size: 540 MB - Last synced at: 3 days ago - Pushed at: about 3 years ago - Stars: 200 - Forks: 40

jrzaurin/pytorch-widedeep
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
Language: Python - Size: 99.6 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 1,344 - Forks: 194

thuiar/MIntRec
MIntRec: A New Dataset for Multimodal Intent Recognition (ACM MM 2022)
Language: Python - Size: 1.5 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 88 - Forks: 15

MIDA-group/CoMIR_INSPIRE
Framework for Multimodal Deformable Image Registration. Coordinated equivariant representation learning (CoMIR) combined with robust deformable registration by INSPIRE.
Language: Python - Size: 9.9 MB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 10 - Forks: 1

willxxy/ECG-Bench
A Unified Framework for Benchmarking Generative Electrocardiogram-Language Models (ELMs)
Language: Python - Size: 6.67 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 9 - Forks: 2

willxxy/awesome-mmps
Corpus of resources for multimodal machine learning with physiological signals (mmps).
Size: 1.1 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 75 - Forks: 2

Yutong-Zhou-cv/Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
Size: 69.2 MB - Last synced at: 8 days ago - Pushed at: 14 days ago - Stars: 2,330 - Forks: 200

GerrySant/multimodalhugs
MultimodalHugs is an extension of Hugging Face that offers a generalized framework for training, evaluating, and using multimodal AI models with minimal code differences, ensuring seamless compatibility with Hugging Face pipelines.
Language: Python - Size: 4.24 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 3 - Forks: 2

samsad35/VQ-MAE-AudioVisual-code
A vector quantized masked autoencoder for audiovisual speech emotion recognition
Language: Python - Size: 21.7 MB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

TheShadow29/awesome-grounding
awesome grounding: A curated list of research papers in visual grounding
Size: 172 KB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 1,070 - Forks: 100

boemer00/cooper-mvp
We’re building the emotional intelligence layer for all marketing decisions in a multimodal world.
Language: Python - Size: 36.1 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

pxxpassi/Allurelle-Skincare-Recommender-App
Recommending users with products based on image processing and external factors to build an inclusive selfcare community
Language: Dart - Size: 34.8 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

friedrichor/Awesome-Multimodal-Papers
A curated list of awesome Multimodal studies.
Language: HTML - Size: 63.3 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 186 - Forks: 18

ParitoshParmar/Piano-Skills-Assessment
Piano Skills Assessment [IEEE MMSP 2021]
Language: Python - Size: 854 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 17 - Forks: 2

AI4Finance-Foundation/FinRobot
FinRobot: An Open-Source AI Agent Platform for Financial Analysis using LLMs 🚀 🚀 🚀
Language: Jupyter Notebook - Size: 7.4 MB - Last synced at: 15 days ago - Pushed at: 6 months ago - Stars: 3,216 - Forks: 544

frankaging/Multimodal-Transformer
Attention Based Multi-modal Emotion Recognition; Stanford Emotional Narratives Dataset
Language: Python - Size: 458 MB - Last synced at: 11 days ago - Pushed at: over 5 years ago - Stars: 18 - Forks: 1

canary-for-cognition/multimodal-dl-framework
An extensible PyTorch framework to experiment with neural-networks-based deep learning algorithms on multiple data modalities for binary classification.
Language: Python - Size: 2.22 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 9 - Forks: 3

ilaria-manco/multimodal-ml-music
List of academic resources on Multimodal ML for Music
Language: TeX - Size: 268 KB - Last synced at: 13 days ago - Pushed at: about 2 years ago - Stars: 295 - Forks: 11

xmarva/transformer-based-architectures
Breakdown of SoTA transformer-based architectures
Language: Jupyter Notebook - Size: 741 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

declare-lab/awesome-emotion-recognition-in-conversations
A comprehensive reading list for Emotion Recognition in Conversations
Size: 273 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 269 - Forks: 45

RaptorMai/MLLM-CompBench
[NeurIPS'25] MLLM-CompBench evaluates the comparative reasoning of MLLMs with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, state, emotion, temporality, spatiality, quantity, and quality. CompBench covers diverse visual domains, including animals, fashion, sports, and scenes
Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: 21 days ago - Pushed at: 22 days ago - Stars: 38 - Forks: 2

kyegomez/PALI
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
Language: Python - Size: 624 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 89 - Forks: 8

zhu-xlab/DOFA
Code for Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities
Language: Jupyter Notebook - Size: 993 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 126 - Forks: 11

westlake-repl/Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review
Paper List of Pre-trained Foundation Recommender Models
Size: 444 KB - Last synced at: 18 days ago - Pushed at: 9 months ago - Stars: 349 - Forks: 27

MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Language: Python - Size: 186 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 413 - Forks: 34

KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
Language: Python - Size: 1.06 MB - Last synced at: 28 days ago - Pushed at: 6 months ago - Stars: 1,928 - Forks: 333

kyegomez/BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
Language: Python - Size: 36.5 MB - Last synced at: 25 days ago - Pushed at: about 1 month ago - Stars: 1,772 - Forks: 161

steve-zeyu-zhang/MotionAnything
🔥 Motion Anything: Any to Motion Generation
Size: 183 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 156 - Forks: 2

geoaigroup/awesome-vision-language-models-for-earth-observation
A curated list of awesome vision and language resources for earth observation.
Size: 470 KB - Last synced at: 22 days ago - Pushed at: about 2 months ago - Stars: 219 - Forks: 17

sail-sg/CLoT
CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".
Language: Python - Size: 6.46 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 310 - Forks: 16

AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Language: C++ - Size: 104 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 1,685 - Forks: 191

ninibymilk/PMF-MMEA
[ACL2024] Progressively Modality Freezing for Multi-Modal Entity Alignment
Language: Python - Size: 551 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 16 - Forks: 0

RunyuFan/UisNet-TGRS-2022
Code for TGRS 2022 paper "Fine-scale Urban Informal Settlements Mapping by Fusing Remote Sensing Images and Building Data via a Transformer-based Multimodal Fusion Network"
Language: Python - Size: 142 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 10 - Forks: 1

thuiar/MMSA-FET
A Tool for extracting multimodal features from videos.
Language: Python - Size: 24.4 MB - Last synced at: 28 days ago - Pushed at: about 2 years ago - Stars: 165 - Forks: 22

fraunhoferhhi/spvloc
[ECCV 2024 Oral] SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments
Language: Python - Size: 2.99 MB - Last synced at: 29 days ago - Pushed at: 30 days ago - Stars: 31 - Forks: 2

fcakyon/content-moderation-deep-learning
Deep learning based content moderation from text, audio, video & image input modalities.
Size: 188 KB - Last synced at: 28 days ago - Pushed at: 5 months ago - Stars: 343 - Forks: 20

stevejpapad/relevant-evidence-detection
Official repository for the "RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection" paper.
Language: Python - Size: 40.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 2

stevejpapad/miscaptioned-image-reconstruction
Repository for the "Latent Multimodal Reconstruction for Misinformation Detection" paper
Language: Python - Size: 395 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

PAIR-Systems-Inc/little-dorrit-editor
Multimodal benchmark for evaluating handwritten editorial correction in printed text.
Language: Python - Size: 13.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

rizavelioglu/hateful_memes-hate_detectron
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxiv.org/abs/2012.12975
Language: Jupyter Notebook - Size: 11.2 MB - Last synced at: 27 days ago - Pushed at: about 1 year ago - Stars: 59 - Forks: 18

shamanez/Self-Supervised-Embedding-Fusion-Transformer
The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.
Language: Python - Size: 4.65 MB - Last synced at: 20 days ago - Pushed at: over 3 years ago - Stars: 119 - Forks: 22

om-ai-lab/VL-CheckList
Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]
Language: Python - Size: 26.6 MB - Last synced at: 20 days ago - Pushed at: 7 months ago - Stars: 129 - Forks: 5

kyegomez/NaViT
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
Language: Python - Size: 210 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 226 - Forks: 10

thuiar/UMC
Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances (ACL 2024)
Language: Python - Size: 1.89 MB - Last synced at: 25 days ago - Pushed at: 5 months ago - Stars: 25 - Forks: 3

aehrc/cvt2distilgpt2
Improving Chest X-Ray Report Generation by Leveraging Warm-Starting
Language: Python - Size: 93.5 MB - Last synced at: 28 days ago - Pushed at: 12 months ago - Stars: 67 - Forks: 7

declare-lab/Multimodal-Infomax
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.
Language: Python - Size: 145 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 179 - Forks: 34

willxxy/ECG-Byte
[arxiv 2024] ECG-Byte: A Tokenizer for End-to-End Generative Electrocardiogram Language Modeling
Language: Python - Size: 27.5 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 14 - Forks: 0

yuanze-lin/REVIVE
[NeurIPS 2022] Official code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Language: Python - Size: 3.39 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 100 - Forks: 10

AdrianBZG/HyperBERT
Code for "HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs" (EMNLP 2024)
Language: Python - Size: 26.4 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 20 - Forks: 0

phellonchen/awesome-Vision-and-Language-Pre-training
Recent Advances in Vision and Language Pre-training (VLP)
Size: 81.1 KB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 292 - Forks: 16

mbaqer/V2X-mmWave-Beamforming
PyTorch implementation of multi-modality sensing in 60 GHz mmWave beamforming for connected vehicles.
Language: Jupyter Notebook - Size: 5.09 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

brian-cy-chang/Multimodal_VB-Fracture-Detector
An easy-to-use framework for multimodal models to detect vertebral body fractures in PyTorch
Language: Python - Size: 1.82 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

declare-lab/LLM-PuzzleTest
This repository is maintained to release dataset and models for multimodal puzzle reasoning.
Language: Python - Size: 131 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 78 - Forks: 7

Yutong-Zhou-cv/Awesome-Multimodality
A Survey on multimodal learning research.
Size: 1.76 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 324 - Forks: 22

jiayuww/SpatialEval
[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs
Language: Python - Size: 3.95 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 23 - Forks: 0

cambridgeltl/visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
Language: Python - Size: 10.7 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 118 - Forks: 9

yuewang-cuhk/awesome-vision-language-pretraining-papers
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
Size: 104 KB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 1,152 - Forks: 104

LeapLabTHU/Pseudo-Q
[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Language: Python - Size: 22.9 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 148 - Forks: 10

yuanze-lin/Learnable_Regions
[CVPR 2024] Official code for "Text-Driven Image Editing via Learnable Regions"
Language: Python - Size: 11.5 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 219 - Forks: 21

declare-lab/BBFN
This repository contains the implementation of the paper -- Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis
Language: Python - Size: 1.25 MB - Last synced at: 26 days ago - Pushed at: about 2 years ago - Stars: 71 - Forks: 14

jaisidhsingh/LoRA-CLIP
Easy wrapper for inserting LoRA layers in CLIP.
Language: Python - Size: 60.5 KB - Last synced at: 22 days ago - Pushed at: 11 months ago - Stars: 31 - Forks: 3

icedpanda/COMPASS-official
Official Implementation of Unveiling User Preferences: A Knowledge Graph and LLM-Driven Approach for Conversational Recommendation
Language: Python - Size: 82.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

omriav/blended-latent-diffusion
Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]
Language: Jupyter Notebook - Size: 9.84 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 594 - Forks: 37

LamineTourelab/MOGONET
MOGONET (Multi-Omics Graph cOnvolutional NETworks) is multi-omics data integrative analysis framework for classification tasks in biomedical applications.
Language: Jupyter Notebook - Size: 56.6 MB - Last synced at: 20 days ago - Pushed at: about 1 month ago - Stars: 14 - Forks: 1

ksm26/Open-Source-Models-with-Hugging-Face
"Open Source Models with Hugging Face" course empowers you with the skills to leverage open-source models from the Hugging Face Hub for various tasks in NLP, audio, image, and multimodal domains.
Language: Jupyter Notebook - Size: 21 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 24 - Forks: 19

Davidlequnchen/Awesome-AM-process-monitoring-control
A curated collection of research papers with open-source implementations/datasets focused on in-situ process monitoring and adaptive control in laser-based additive manufacturing.
Size: 57.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

fork123aniket/Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot
Streamlit App Combining Vision, Language, and Audio AI Models
Language: Python - Size: 18.6 KB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

Davidlequnchen/LDED-FusionNet
LDED-FusionNet: Machine Learning-Based Audio-Visual Defect Detection for LDED AM Process
Language: Jupyter Notebook - Size: 1.18 GB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 1

rohit901/VANE-Bench
[NAACL'25] Contains code and documentation for our VANE-Bench paper.
Language: Python - Size: 38.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 11 - Forks: 1

GangGreenTemperTatum/toronto-visual-ai-hackathon-2025
Visual AI Hackathon Project
Size: 125 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 1

Vision-CAIR/3DCoMPaT-v2
3DCoMPaT++: An improved large-scale 3D vision dataset for compositional recognition
Language: Python - Size: 133 MB - Last synced at: 24 days ago - Pushed at: 10 months ago - Stars: 82 - Forks: 6

Afrid1045/Brain-Tumor-Severity-Prediction-using-Multi-Modal-Squeeze-and-Excitation-Network
The project focuses on classifying brain tumors using the Multi-Modal Squeeze and Excitation Network.
Language: Jupyter Notebook - Size: 4.16 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

kyegomez/MultiModalCrossAttn
The open source implementation of the cross attention mechanism from the paper: "JOINTLY TRAINING LARGE AUTOREGRESSIVE MULTIMODAL MODELS"
Language: Python - Size: 223 KB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 1

association-rosia/crop-forecasting
Predicting rice field yields through the integration of Microsoft Planetary satellite images, meteorological data, and field information in the 2023 EY Open Science Data Challenge - Crop Forecasting.
Language: Jupyter Notebook - Size: 341 MB - Last synced at: 28 days ago - Pushed at: about 1 year ago - Stars: 20 - Forks: 3

icon-lab/MedTrim
Official implementation of "Meta-Entity Driven Triplet Mining for Aligning Medical Vision-Language Models"
Language: Python - Size: 40 KB - Last synced at: 25 days ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 1

DWCTOD/CVPR2024-Papers-with-Code-Demo
收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!
Size: 137 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 1,335 - Forks: 150

Rajeevveera24/LatentAlignmentProcedural
This repository is cloned from https://github.com/HLR/LatentAlignmentProcedural. This is a potential baseline explored for the textual_cloze task on the RecipeQA Dataset - https://hucvl.github.io/recipeqa/
Language: Jupyter Notebook - Size: 47 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Davidlequnchen/MultiSensorFusion-ROS-AM-Monitoring
ROS-based Multisensor Fusion Digital Twin (MFDT) platform for real-time monitoring and defect detection of Laser-Directed Energy Deposition (L-DED) Additive Manufacturing (AM) process.
Language: HTML - Size: 3.8 GB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 1

taco-group/DecAlign
A novel cross-modal decoupling and alignment framework for multimodal representation learning.
Language: JavaScript - Size: 13.8 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

declare-lab/MM-Align
[EMNLP 2022] This repository contains the official implementation of the paper "MM-Align: Learning Optimal Transport-based Alignment Dynamics for Fast and Accurate Inference on Missing Modality Sequences"
Language: Python - Size: 284 KB - Last synced at: 26 days ago - Pushed at: about 1 year ago - Stars: 28 - Forks: 2

eezkni/M2Trans
[IEEE J-BHI-2024] Pytorch implementation of "M2Trans: Multi-Modal Regularized Coarse-to-Fine Transformer for Ultrasound Image Super-Resolution"
Language: Python - Size: 113 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 2

42jaylonw/shifu
Lightweight Isaac Gym Environment Builder
Language: Python - Size: 31.6 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 37 - Forks: 2

DirtyHarryLYL/DJ-RN
As a part of HAKE project (HAKE-3D). Code for our CVPR2020 paper "Detailed 2D-3D Joint Representation for Human-Object Interaction".
Language: Python - Size: 5.3 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 100 - Forks: 14

ai4ce/MARS
[CVPR2024] Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset
Language: Python - Size: 370 MB - Last synced at: 28 days ago - Pushed at: 11 months ago - Stars: 50 - Forks: 1

soujanyaporia/MUStARD
Multimodal Sarcasm Detection Dataset
Language: OpenEdge ABL - Size: 75.4 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 335 - Forks: 62

abhi227070/Advanced-Dish-Detection-using-AI
DishVision AI is a multimodal food recognition app powered by Google Gemini AI and Streamlit. Upload or capture a dish image, and the AI will detect its name, ingredients, and recipe instantly! 🚀🔥
Language: Python - Size: 1.34 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

justinbt1/Multimodal-Document-Classification
MSc project investigating multi-modal fusion approaches to combining textual and visual features for multi-page classification of documents within the OGA National Data Repository (NDR).
Language: Jupyter Notebook - Size: 11.9 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

kyegomez/the-compiler
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
Language: Python - Size: 10.4 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 145 - Forks: 17
