An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: multimodal-deep-learning

rahul-jaiswar-git/Toxic-Content-Analyzer-with-Perspective-API

A modern, multi-modal hate speech detection web app using the Perspective API. Analyze text, images, audio, and video for toxic or harmful content in a user-friendly interface.

Language: Python - Size: 18.2 MB - Last synced at: about 5 hours ago - Pushed at: about 6 hours ago - Stars: 0 - Forks: 0

salesforce/LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language: Jupyter Notebook - Size: 79.3 MB - Last synced at: about 18 hours ago - Pushed at: 6 months ago - Stars: 10,528 - Forks: 1,025

Mrkomiljon/awesome-generative-ai

Multimodal generative AI resources : talking heads, STT, TTS, image & video generation, and more.

Size: 2.31 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 1

Devanshpandey/preCog-Multimodal-AI-for-Precision-Cardiology

Code used training preCog

Language: Jupyter Notebook - Size: 12.7 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

remyxai/VQASynth

Compose multimodal datasets 🎹

Language: Python - Size: 17.5 MB - Last synced at: 2 days ago - Pushed at: 19 days ago - Stars: 365 - Forks: 14

ForestsKing/Awesome-Multimodal-Time-Series

A curated list of paper, code, data, and other resources focus on multimodal time series analysis.

Size: 9.77 KB - Last synced at: 3 days ago - Pushed at: 9 days ago - Stars: 71 - Forks: 4

nicolay-r/nicolay-r

This is my personal news list updates in Information Retrieval domain

Size: 244 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

Duplums/CoMM

[ICLR 2025] Multi-modal representation learning of shared, unique and synergistic features between modalities

Language: Python - Size: 2.94 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 11 - Forks: 4

akusayudodograu/Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling

Size: 1000 Bytes - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 8 - Forks: 1

aimotive/mm_training

Multimodal model training on aiMotive Dataset

Language: Python - Size: 2.87 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 16 - Forks: 4

richard-peng-xia/awesome-multimodal-in-medical-imaging

A collection of resources on applications of multi-modal learning in medical imaging.

Size: 234 KB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 730 - Forks: 65

floriankulig/neural-navi

Driver CoPilot as a student research project. Using multimodal data-input-streams from a cars telemetry and camera data to try to predict what would be the best drivers' manouver.

Language: Python - Size: 56.8 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

mahmoodlab/MCAT

Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021

Language: Jupyter Notebook - Size: 540 MB - Last synced at: 3 days ago - Pushed at: about 3 years ago - Stars: 200 - Forks: 40

jrzaurin/pytorch-widedeep

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

Language: Python - Size: 99.6 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 1,344 - Forks: 194

thuiar/MIntRec

MIntRec: A New Dataset for Multimodal Intent Recognition (ACM MM 2022)

Language: Python - Size: 1.5 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 88 - Forks: 15

MIDA-group/CoMIR_INSPIRE

Framework for Multimodal Deformable Image Registration. Coordinated equivariant representation learning (CoMIR) combined with robust deformable registration by INSPIRE.

Language: Python - Size: 9.9 MB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 10 - Forks: 1

willxxy/ECG-Bench

A Unified Framework for Benchmarking Generative Electrocardiogram-Language Models (ELMs)

Language: Python - Size: 6.67 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 9 - Forks: 2

willxxy/awesome-mmps

Corpus of resources for multimodal machine learning with physiological signals (mmps).

Size: 1.1 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 75 - Forks: 2

Yutong-Zhou-cv/Awesome-Text-to-Image

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

Size: 69.2 MB - Last synced at: 8 days ago - Pushed at: 14 days ago - Stars: 2,330 - Forks: 200

GerrySant/multimodalhugs

MultimodalHugs is an extension of Hugging Face that offers a generalized framework for training, evaluating, and using multimodal AI models with minimal code differences, ensuring seamless compatibility with Hugging Face pipelines.

Language: Python - Size: 4.24 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 3 - Forks: 2

samsad35/VQ-MAE-AudioVisual-code

A vector quantized masked autoencoder for audiovisual speech emotion recognition

Language: Python - Size: 21.7 MB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

TheShadow29/awesome-grounding

awesome grounding: A curated list of research papers in visual grounding

Size: 172 KB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 1,070 - Forks: 100

boemer00/cooper-mvp

We’re building the emotional intelligence layer for all marketing decisions in a multimodal world.

Language: Python - Size: 36.1 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

pxxpassi/Allurelle-Skincare-Recommender-App

Recommending users with products based on image processing and external factors to build an inclusive selfcare community

Language: Dart - Size: 34.8 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

friedrichor/Awesome-Multimodal-Papers

A curated list of awesome Multimodal studies.

Language: HTML - Size: 63.3 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 186 - Forks: 18

ParitoshParmar/Piano-Skills-Assessment

Piano Skills Assessment [IEEE MMSP 2021]

Language: Python - Size: 854 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 17 - Forks: 2

AI4Finance-Foundation/FinRobot

FinRobot: An Open-Source AI Agent Platform for Financial Analysis using LLMs 🚀 🚀 🚀

Language: Jupyter Notebook - Size: 7.4 MB - Last synced at: 15 days ago - Pushed at: 6 months ago - Stars: 3,216 - Forks: 544

frankaging/Multimodal-Transformer

Attention Based Multi-modal Emotion Recognition; Stanford Emotional Narratives Dataset

Language: Python - Size: 458 MB - Last synced at: 11 days ago - Pushed at: over 5 years ago - Stars: 18 - Forks: 1

canary-for-cognition/multimodal-dl-framework

An extensible PyTorch framework to experiment with neural-networks-based deep learning algorithms on multiple data modalities for binary classification.

Language: Python - Size: 2.22 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 9 - Forks: 3

ilaria-manco/multimodal-ml-music

List of academic resources on Multimodal ML for Music

Language: TeX - Size: 268 KB - Last synced at: 13 days ago - Pushed at: about 2 years ago - Stars: 295 - Forks: 11

xmarva/transformer-based-architectures

Breakdown of SoTA transformer-based architectures

Language: Jupyter Notebook - Size: 741 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

declare-lab/awesome-emotion-recognition-in-conversations

A comprehensive reading list for Emotion Recognition in Conversations

Size: 273 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 269 - Forks: 45

RaptorMai/MLLM-CompBench

[NeurIPS'25] MLLM-CompBench evaluates the comparative reasoning of MLLMs with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, state, emotion, temporality, spatiality, quantity, and quality. CompBench covers diverse visual domains, including animals, fashion, sports, and scenes

Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: 21 days ago - Pushed at: 22 days ago - Stars: 38 - Forks: 2

kyegomez/PALI

Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"

Language: Python - Size: 624 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 89 - Forks: 8

zhu-xlab/DOFA

Code for Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities

Language: Jupyter Notebook - Size: 993 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 126 - Forks: 11

westlake-repl/Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review

Paper List of Pre-trained Foundation Recommender Models

Size: 444 KB - Last synced at: 18 days ago - Pushed at: 9 months ago - Stars: 349 - Forks: 27

MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Language: Python - Size: 186 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 413 - Forks: 34

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"

Language: Python - Size: 1.06 MB - Last synced at: 28 days ago - Pushed at: 6 months ago - Stars: 1,928 - Forks: 333

kyegomez/BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Language: Python - Size: 36.5 MB - Last synced at: 25 days ago - Pushed at: about 1 month ago - Stars: 1,772 - Forks: 161

steve-zeyu-zhang/MotionAnything

🔥 Motion Anything: Any to Motion Generation

Size: 183 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 156 - Forks: 2

geoaigroup/awesome-vision-language-models-for-earth-observation

A curated list of awesome vision and language resources for earth observation.

Size: 470 KB - Last synced at: 22 days ago - Pushed at: about 2 months ago - Stars: 219 - Forks: 17

sail-sg/CLoT

CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".

Language: Python - Size: 6.46 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 310 - Forks: 16

AlibabaResearch/AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Language: C++ - Size: 104 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 1,685 - Forks: 191

ninibymilk/PMF-MMEA

[ACL2024] Progressively Modality Freezing for Multi-Modal Entity Alignment

Language: Python - Size: 551 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 16 - Forks: 0

RunyuFan/UisNet-TGRS-2022

Code for TGRS 2022 paper "Fine-scale Urban Informal Settlements Mapping by Fusing Remote Sensing Images and Building Data via a Transformer-based Multimodal Fusion Network"

Language: Python - Size: 142 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 10 - Forks: 1

thuiar/MMSA-FET

A Tool for extracting multimodal features from videos.

Language: Python - Size: 24.4 MB - Last synced at: 28 days ago - Pushed at: about 2 years ago - Stars: 165 - Forks: 22

fraunhoferhhi/spvloc

[ECCV 2024 Oral] SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments

Language: Python - Size: 2.99 MB - Last synced at: 29 days ago - Pushed at: 30 days ago - Stars: 31 - Forks: 2

fcakyon/content-moderation-deep-learning

Deep learning based content moderation from text, audio, video & image input modalities.

Size: 188 KB - Last synced at: 28 days ago - Pushed at: 5 months ago - Stars: 343 - Forks: 20

stevejpapad/relevant-evidence-detection

Official repository for the "RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection" paper.

Language: Python - Size: 40.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 2

stevejpapad/miscaptioned-image-reconstruction

Repository for the "Latent Multimodal Reconstruction for Misinformation Detection" paper

Language: Python - Size: 395 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

PAIR-Systems-Inc/little-dorrit-editor

Multimodal benchmark for evaluating handwritten editorial correction in printed text.

Language: Python - Size: 13.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

rizavelioglu/hateful_memes-hate_detectron

Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxiv.org/abs/2012.12975

Language: Jupyter Notebook - Size: 11.2 MB - Last synced at: 27 days ago - Pushed at: about 1 year ago - Stars: 59 - Forks: 18

shamanez/Self-Supervised-Embedding-Fusion-Transformer

The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.

Language: Python - Size: 4.65 MB - Last synced at: 20 days ago - Pushed at: over 3 years ago - Stars: 119 - Forks: 22

om-ai-lab/VL-CheckList

Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]

Language: Python - Size: 26.6 MB - Last synced at: 20 days ago - Pushed at: 7 months ago - Stars: 129 - Forks: 5

kyegomez/NaViT

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

Language: Python - Size: 210 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 226 - Forks: 10

thuiar/UMC

Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances (ACL 2024)

Language: Python - Size: 1.89 MB - Last synced at: 25 days ago - Pushed at: 5 months ago - Stars: 25 - Forks: 3

aehrc/cvt2distilgpt2

Improving Chest X-Ray Report Generation by Leveraging Warm-Starting

Language: Python - Size: 93.5 MB - Last synced at: 28 days ago - Pushed at: 12 months ago - Stars: 67 - Forks: 7

declare-lab/Multimodal-Infomax

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

Language: Python - Size: 145 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 179 - Forks: 34

willxxy/ECG-Byte

[arxiv 2024] ECG-Byte: A Tokenizer for End-to-End Generative Electrocardiogram Language Modeling

Language: Python - Size: 27.5 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 14 - Forks: 0

yuanze-lin/REVIVE

[NeurIPS 2022] Official code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

Language: Python - Size: 3.39 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 100 - Forks: 10

AdrianBZG/HyperBERT

Code for "HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs" (EMNLP 2024)

Language: Python - Size: 26.4 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 20 - Forks: 0

phellonchen/awesome-Vision-and-Language-Pre-training

Recent Advances in Vision and Language Pre-training (VLP)

Size: 81.1 KB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 292 - Forks: 16

mbaqer/V2X-mmWave-Beamforming

PyTorch implementation of multi-modality sensing in 60 GHz mmWave beamforming for connected vehicles.

Language: Jupyter Notebook - Size: 5.09 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

brian-cy-chang/Multimodal_VB-Fracture-Detector

An easy-to-use framework for multimodal models to detect vertebral body fractures in PyTorch

Language: Python - Size: 1.82 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

declare-lab/LLM-PuzzleTest

This repository is maintained to release dataset and models for multimodal puzzle reasoning.

Language: Python - Size: 131 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 78 - Forks: 7

Yutong-Zhou-cv/Awesome-Multimodality

A Survey on multimodal learning research.

Size: 1.76 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 324 - Forks: 22

jiayuww/SpatialEval

[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs

Language: Python - Size: 3.95 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 23 - Forks: 0

cambridgeltl/visual-spatial-reasoning

[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.

Language: Python - Size: 10.7 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 118 - Forks: 9

yuewang-cuhk/awesome-vision-language-pretraining-papers

Recent Advances in Vision and Language PreTrained Models (VL-PTMs)

Size: 104 KB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 1,152 - Forks: 104

LeapLabTHU/Pseudo-Q

[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

Language: Python - Size: 22.9 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 148 - Forks: 10

yuanze-lin/Learnable_Regions

[CVPR 2024] Official code for "Text-Driven Image Editing via Learnable Regions"

Language: Python - Size: 11.5 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 219 - Forks: 21

declare-lab/BBFN

This repository contains the implementation of the paper -- Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

Language: Python - Size: 1.25 MB - Last synced at: 26 days ago - Pushed at: about 2 years ago - Stars: 71 - Forks: 14

jaisidhsingh/LoRA-CLIP

Easy wrapper for inserting LoRA layers in CLIP.

Language: Python - Size: 60.5 KB - Last synced at: 22 days ago - Pushed at: 11 months ago - Stars: 31 - Forks: 3

icedpanda/COMPASS-official

Official Implementation of Unveiling User Preferences: A Knowledge Graph and LLM-Driven Approach for Conversational Recommendation

Language: Python - Size: 82.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

omriav/blended-latent-diffusion

Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]

Language: Jupyter Notebook - Size: 9.84 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 594 - Forks: 37

LamineTourelab/MOGONET

MOGONET (Multi-Omics Graph cOnvolutional NETworks) is multi-omics data integrative analysis framework for classification tasks in biomedical applications.

Language: Jupyter Notebook - Size: 56.6 MB - Last synced at: 20 days ago - Pushed at: about 1 month ago - Stars: 14 - Forks: 1

ksm26/Open-Source-Models-with-Hugging-Face

"Open Source Models with Hugging Face" course empowers you with the skills to leverage open-source models from the Hugging Face Hub for various tasks in NLP, audio, image, and multimodal domains.

Language: Jupyter Notebook - Size: 21 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 24 - Forks: 19

Davidlequnchen/Awesome-AM-process-monitoring-control

A curated collection of research papers with open-source implementations/datasets focused on in-situ process monitoring and adaptive control in laser-based additive manufacturing.

Size: 57.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

fork123aniket/Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot

Streamlit App Combining Vision, Language, and Audio AI Models

Language: Python - Size: 18.6 KB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

Davidlequnchen/LDED-FusionNet

LDED-FusionNet: Machine Learning-Based Audio-Visual Defect Detection for LDED AM Process

Language: Jupyter Notebook - Size: 1.18 GB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 1

rohit901/VANE-Bench

[NAACL'25] Contains code and documentation for our VANE-Bench paper.

Language: Python - Size: 38.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 11 - Forks: 1

GangGreenTemperTatum/toronto-visual-ai-hackathon-2025

Visual AI Hackathon Project

Size: 125 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 1

Vision-CAIR/3DCoMPaT-v2

3DCoMPaT++: An improved large-scale 3D vision dataset for compositional recognition

Language: Python - Size: 133 MB - Last synced at: 24 days ago - Pushed at: 10 months ago - Stars: 82 - Forks: 6

Afrid1045/Brain-Tumor-Severity-Prediction-using-Multi-Modal-Squeeze-and-Excitation-Network

The project focuses on classifying brain tumors using the Multi-Modal Squeeze and Excitation Network.

Language: Jupyter Notebook - Size: 4.16 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

kyegomez/MultiModalCrossAttn

The open source implementation of the cross attention mechanism from the paper: "JOINTLY TRAINING LARGE AUTOREGRESSIVE MULTIMODAL MODELS"

Language: Python - Size: 223 KB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 1

association-rosia/crop-forecasting

Predicting rice field yields through the integration of Microsoft Planetary satellite images, meteorological data, and field information in the 2023 EY Open Science Data Challenge - Crop Forecasting.

Language: Jupyter Notebook - Size: 341 MB - Last synced at: 28 days ago - Pushed at: about 1 year ago - Stars: 20 - Forks: 3

icon-lab/MedTrim

Official implementation of "Meta-Entity Driven Triplet Mining for Aligning Medical Vision-Language Models"

Language: Python - Size: 40 KB - Last synced at: 25 days ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 1

DWCTOD/CVPR2024-Papers-with-Code-Demo

收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!

Size: 137 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 1,335 - Forks: 150

Rajeevveera24/LatentAlignmentProcedural

This repository is cloned from https://github.com/HLR/LatentAlignmentProcedural. This is a potential baseline explored for the textual_cloze task on the RecipeQA Dataset - https://hucvl.github.io/recipeqa/

Language: Jupyter Notebook - Size: 47 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Davidlequnchen/MultiSensorFusion-ROS-AM-Monitoring

ROS-based Multisensor Fusion Digital Twin (MFDT) platform for real-time monitoring and defect detection of Laser-Directed Energy Deposition (L-DED) Additive Manufacturing (AM) process.

Language: HTML - Size: 3.8 GB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 1

taco-group/DecAlign

A novel cross-modal decoupling and alignment framework for multimodal representation learning.

Language: JavaScript - Size: 13.8 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

declare-lab/MM-Align

[EMNLP 2022] This repository contains the official implementation of the paper "MM-Align: Learning Optimal Transport-based Alignment Dynamics for Fast and Accurate Inference on Missing Modality Sequences"

Language: Python - Size: 284 KB - Last synced at: 26 days ago - Pushed at: about 1 year ago - Stars: 28 - Forks: 2

eezkni/M2Trans

[IEEE J-BHI-2024] Pytorch implementation of "M2Trans: Multi-Modal Regularized Coarse-to-Fine Transformer for Ultrasound Image Super-Resolution"

Language: Python - Size: 113 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 2

42jaylonw/shifu

Lightweight Isaac Gym Environment Builder

Language: Python - Size: 31.6 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 37 - Forks: 2

DirtyHarryLYL/DJ-RN

As a part of HAKE project (HAKE-3D). Code for our CVPR2020 paper "Detailed 2D-3D Joint Representation for Human-Object Interaction".

Language: Python - Size: 5.3 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 100 - Forks: 14

ai4ce/MARS

[CVPR2024] Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

Language: Python - Size: 370 MB - Last synced at: 28 days ago - Pushed at: 11 months ago - Stars: 50 - Forks: 1

soujanyaporia/MUStARD

Multimodal Sarcasm Detection Dataset

Language: OpenEdge ABL - Size: 75.4 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 335 - Forks: 62

abhi227070/Advanced-Dish-Detection-using-AI

DishVision AI is a multimodal food recognition app powered by Google Gemini AI and Streamlit. Upload or capture a dish image, and the AI will detect its name, ingredients, and recipe instantly! 🚀🔥

Language: Python - Size: 1.34 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

justinbt1/Multimodal-Document-Classification

MSc project investigating multi-modal fusion approaches to combining textual and visual features for multi-page classification of documents within the OGA National Data Repository (NDR).

Language: Jupyter Notebook - Size: 11.9 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

kyegomez/the-compiler

Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!

Language: Python - Size: 10.4 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 145 - Forks: 17

Related Keywords
multimodal-deep-learning 412 deep-learning 105 multimodal 86 pytorch 64 computer-vision 56 machine-learning 47 multimodal-learning 38 natural-language-processing 26 nlp 24 multimodality 22 vision-and-language 21 tensorflow 20 python 20 large-language-models 19 attention-mechanism 16 transformers 16 transformer 16 llm 14 artificial-intelligence 14 generative-ai 14 multimodal-sentiment-analysis 14 multimodal-large-language-models 13 deep-neural-networks 13 gpt4 13 self-supervised-learning 13 classification 13 emotion-recognition 12 visual-question-answering 11 dataset 11 convolutional-neural-networks 11 neural-network 10 attention 10 multimodal-datasets 10 clip 9 ai 9 image-processing 9 vision-language-transformer 8 bert 8 vision-transformer 8 object-detection 8 language-model 8 sentiment-analysis 8 awesome-list 8 image 8 time-series 8 image-classification 8 vision-language-model 7 vision-language 7 pytorch-lightning 7 diffusion-models 7 multimodal-fusion 7 multimodal-data 7 representation-learning 7 image-captioning 7 multimodal-representation 7 cnn 7 lstm 6 reinforcement-learning 6 vision-language-pretraining 6 text-to-image 6 foundation-models 6 neural-networks 6 huggingface-transformers 6 remote-sensing 6 graph-neural-networks 6 keras 6 3d 6 deeplearning 6 variational-autoencoder 5 question-answering 5 embeddings 5 data-fusion 5 python3 5 generative-adversarial-network 5 image-generation 5 transformer-models 5 gan 5 speech-recognition 5 generative-model 5 audio 5 paper 5 audio-processing 5 visual-grounding 5 semantic-segmentation 5 point-cloud 5 recommender-system 5 transfer-learning 5 large-multimodal-models 5 anomaly-detection 5 contrastive-learning 5 text 5 attention-is-all-you-need 5 nlp-machine-learning 5 vqa 5 memes 5 multimodal-interactions 5 multimodal-retrieval 4 cvpr 4 multi-modal 4 knowledge-graph 4