Topic: "multimodal-deep-learning"
diegovalsesia/XMFnet
Code for "Cross-modal Learning for Image-Guided Point Cloud Shape Completion" (NeurIPS 2022)
Language: Python - Size: 22.2 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 34 - Forks: 6

yuhui-zh15/drml
Official Code Release for "Diagnosing and Rectifying Vision Models using Language" (ICLR 2023)
Language: Jupyter Notebook - Size: 19.2 MB - Last synced at: 10 days ago - Pushed at: almost 2 years ago - Stars: 33 - Forks: 0

visinf/lnfmm
Latent Normalizing Flows for Many-to-Many Cross Domain Mappings (ICLR 2020)
Language: Python - Size: 1000 KB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 33 - Forks: 12

zch42/BiFusion
Language: Python - Size: 2.08 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 32 - Forks: 9

JunweiLiang/FVTA_MemexQA
Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19
Language: Python - Size: 723 KB - Last synced at: 5 days ago - Pushed at: almost 6 years ago - Stars: 32 - Forks: 15

fraunhoferhhi/spvloc
[ECCV 2024 Oral] SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments
Language: Python - Size: 2.99 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 31 - Forks: 2

jaisidhsingh/LoRA-CLIP
Easy wrapper for inserting LoRA layers in CLIP.
Language: Python - Size: 60.5 KB - Last synced at: 6 days ago - Pushed at: 10 months ago - Stars: 31 - Forks: 3

declare-lab/MSA-Robustness
NAACL 2022 paper on Analyzing Modality Robustness in Multimodal Sentiment Analysis
Language: Python - Size: 3.43 MB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 31 - Forks: 5

usc-sail/mica-deep-mcca
Deep Multiset Canonical Correlation Analysis - An extension of CCA to multiple datasets
Language: Python - Size: 103 MB - Last synced at: 12 months ago - Pushed at: over 5 years ago - Stars: 31 - Forks: 14

phellonchen/awesome-visual-dialog
Recent Advances in Visual Dialog
Size: 36.1 KB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 30 - Forks: 1

IsaacRodgz/ConcatBERT
Baseline model for multimodal classification based on images and text. Text representation obtained from pretrained BERT base model and image representation obtained from VGG16 pretrained model.
Language: Jupyter Notebook - Size: 306 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 29 - Forks: 6

shubhamagarwal92/mmd
This repository contains the Pytorch implementation for our SCAI (EMNLP-2018) submission "A Knowledge-Grounded Multimodal Search-Based Conversational Agent"
Language: Python - Size: 82 KB - Last synced at: 11 months ago - Pushed at: almost 5 years ago - Stars: 29 - Forks: 5

DunnBC22/Vision_Audio_and_Multimodal_Projects
This repository includes all computer vision, audio, document AI, and multimodal projects.
Language: Jupyter Notebook - Size: 108 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 28 - Forks: 5

Nithin-GK/UniteandConquer
[CVPR '23] Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models
Language: Python - Size: 6.55 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 28 - Forks: 3

declare-lab/MM-Align
[EMNLP 2022] This repository contains the official implementation of the paper "MM-Align: Learning Optimal Transport-based Alignment Dynamics for Fast and Accurate Inference on Missing Modality Sequences"
Language: Python - Size: 284 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 28 - Forks: 2

kyegomez/MultiModalCrossAttn
The open source implementation of the cross attention mechanism from the paper: "JOINTLY TRAINING LARGE AUTOREGRESSIVE MULTIMODAL MODELS"
Language: Python - Size: 223 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 1

emerisly/EDIS
Entity-Driven Image Search over Multimodal Web Content (EMNLP 2023)
Language: Python - Size: 1.61 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 27 - Forks: 0

david-yoon/attentive-modality-hopping-for-SER
TensorFlow implementation of "Attentive Modality Hopping for Speech Emotion Recognition," ICASSP-20
Language: Python - Size: 53.7 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 27 - Forks: 8

thuiar/UMC
Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances (ACL 2024)
Language: Python - Size: 1.89 MB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 25 - Forks: 3

georgesterpu/Taris
Transformer-based online speech recognition system with TensorFlow 2
Language: Python - Size: 5.4 MB - Last synced at: 12 months ago - Pushed at: over 4 years ago - Stars: 25 - Forks: 6

ksm26/Open-Source-Models-with-Hugging-Face
"Open Source Models with Hugging Face" course empowers you with the skills to leverage open-source models from the Hugging Face Hub for various tasks in NLP, audio, image, and multimodal domains.
Language: Jupyter Notebook - Size: 21 MB - Last synced at: 27 days ago - Pushed at: about 1 year ago - Stars: 24 - Forks: 19

nyukat/greedy_multimodal_learning
Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks
Language: Python - Size: 47.9 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 24 - Forks: 2

jiayuww/SpatialEval
[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs
Language: Python - Size: 3.95 MB - Last synced at: 19 days ago - Pushed at: 3 months ago - Stars: 23 - Forks: 0

UmarIgan/Machine-Learning
A set of jupyter notebooks
Language: Jupyter Notebook - Size: 16.7 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 23 - Forks: 8

HySonLab/Ligand_Generation
Target-aware Variational Auto-encoders for Ligand Generation with Multimodal Protein Representation Learning
Language: Python - Size: 257 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 23 - Forks: 2

sisinflab/Ducho
Python framework to extract multimodal features for multimodal recommendation in a highly-customizable way.
Language: Python - Size: 3.62 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 22 - Forks: 5

nesl/Robust-Deep-Learning-Pipeline
Deep Convolutional Bidirectional LSTM for Complex Activity Recognition with Missing Data. Human Activity Recognition Challenge. Springer SIST (2020)
Language: Jupyter Notebook - Size: 876 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 22 - Forks: 3

georgepar/slp
Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning
Language: Python - Size: 2.02 MB - Last synced at: 5 months ago - Pushed at: about 2 years ago - Stars: 21 - Forks: 7

UofLBioinformatics/circDeep
End-to-End learning framework for circular RNA classification from other long non-coding RNA using multimodal deep learning
Language: Python - Size: 47.2 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 21 - Forks: 14

AdrianBZG/HyperBERT
Code for "HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs" (EMNLP 2024)
Language: Python - Size: 26.4 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 20 - Forks: 0

cosmaadrian/multimodal-depression-from-video
Official source code for the paper: "Reading Between the Frames Multi-Modal Non-Verbal Depression Detection in Videos"
Language: Python - Size: 370 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 20 - Forks: 2

association-rosia/crop-forecasting
Predicting rice field yields through the integration of Microsoft Planetary satellite images, meteorological data, and field information in the 2023 EY Open Science Data Challenge - Crop Forecasting.
Language: Jupyter Notebook - Size: 341 MB - Last synced at: 13 days ago - Pushed at: about 1 year ago - Stars: 20 - Forks: 3

codezakh/DataEnvGym
A testbed for agents and environments that can automatically improve models through data generation.
Language: Python - Size: 9.16 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 19 - Forks: 5

sverma88/DeepCU-IJCAI19
DeepCU: Integrating Both Common and Unique Latent Information for Multimodal Sentiment Analysis, IJCAI-19
Language: Python - Size: 36.7 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 19 - Forks: 8

Yuan-ManX/ai-multimodal-timeline
Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D content. 🔥
Size: 1.11 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 18 - Forks: 1

declare-lab/M2H2-dataset
This repository contains the dataset and baselines explained in the paper: M2H2: A Multimodal Multiparty Hindi Dataset For HumorRecognition in Conversations
Language: Python - Size: 2.21 GB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 18 - Forks: 12

asnelt/mmae
Package for Multimodal Autoencoders in TensorFlow / Keras
Language: Python - Size: 28.3 KB - Last synced at: 17 days ago - Pushed at: almost 5 years ago - Stars: 18 - Forks: 12

yiren-jian/BLIText
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Language: Python - Size: 34.4 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 17 - Forks: 1

basiralab/MultiGraphGAN
MultiGraphGAN for predicting multiple target graphs from a source graph using geometric deep learning.
Language: Python - Size: 21.8 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 17 - Forks: 4

Nithin-Holla/meme_challenge
Repository containing code from team Kingsterdam for the Hateful Memes Challenge
Language: Python - Size: 1.36 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 17 - Forks: 8

frankaging/Multimodal-Transformer
Attention Based Multi-modal Emotion Recognition; Stanford Emotional Narratives Dataset
Language: Python - Size: 458 MB - Last synced at: 19 days ago - Pushed at: over 5 years ago - Stars: 17 - Forks: 1

ninibymilk/PMF-MMEA
[ACL2024] Progressively Modality Freezing for Multi-Modal Entity Alignment
Language: Python - Size: 551 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 16 - Forks: 0

kyegomez/Pegasus
PegasusX: The Future of Multimodal Embeddings 🦄 🦄
Language: Python - Size: 37.5 MB - Last synced at: 8 days ago - Pushed at: 6 months ago - Stars: 16 - Forks: 5

FuxiaoLiu/DocumentCLIP
[ICPRAI 2024] DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
Language: Python - Size: 2.49 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 0

AmbiTyga/MemSem
A Multi-modal Framework for Sentimental Analysis of Meme
Language: Python - Size: 4.59 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 16 - Forks: 5

orrzohar/LOVM
[NeurIPS 2023] Official Pytorch code for LOVM: Language-Only Vision Model Selection
Language: Python - Size: 4.44 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 15 - Forks: 0

sarthak268/c3vqg-official
PyTorch Implementation for the paper "C3VQG: Category Consistent Cyclic Visual Question Generation" (ACM MM Asia'20).
Language: Python - Size: 63.9 MB - Last synced at: 12 months ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 6

eslambakr/LAR-Look-Around-and-Refer
This is the official implementation for our paper;"LAR:Look Around and Refer".
Language: C++ - Size: 45 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 2

gtatiya/Deep-Multi-Sensory-Object-Categorization
Deep Multi-Sensory Object Category Recognition Using Interactive Behavioral Exploration
Language: Jupyter Notebook - Size: 2.65 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 15 - Forks: 8

willxxy/ECG-Byte
[arxiv 2024] ECG-Byte: A Tokenizer for End-to-End Generative Electrocardiogram Language Modeling
Language: Python - Size: 27.5 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 14 - Forks: 0

LamineTourelab/MOGONET
MOGONET (Multi-Omics Graph cOnvolutional NETworks) is multi-omics data integrative analysis framework for classification tasks in biomedical applications.
Language: Jupyter Notebook - Size: 56.6 MB - Last synced at: 5 days ago - Pushed at: 28 days ago - Stars: 14 - Forks: 1

omeregev/click2mask
[AAAI 2025] Official Implementation for "Click2Mask: Local Editing with Dynamic Mask Generation" Paper.
Language: Python - Size: 61.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 14 - Forks: 2

sisinflab/LoG-2023-GNNs-RecSys
Presented as tutorial at the Second Learning on Graphs Conference (LoG 2023)
Language: Jupyter Notebook - Size: 12.9 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 0

ashutosh1919/data2vec-pytorch
Ready to run PyTorch implementation of Data2Vec 2.0: Highly efficient self-supervised representation learning for vision, speech and text.
Language: Python - Size: 116 KB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 14 - Forks: 2

ParitoshParmar/Piano-Skills-Assessment
Piano Skills Assessment [IEEE MMSP 2021]
Language: Python - Size: 852 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 2

PrithivirajDamodaran/vision-language-modelling-series
Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations
Language: Jupyter Notebook - Size: 6.15 MB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 4

vishaal27/Multimodal-Video-Emotion-Recognition-Pytorch
A Pytorch implementation of emotion recognition from videos
Language: Python - Size: 1.19 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 14 - Forks: 1

gchochla/Deep-Representations-of-Visual-Descriptions
Pytorch implementation of CVPR'16 paper "Learning Deep Representations of Fine-Grained Visual Descriptions", by Reed et al.
Language: Python - Size: 6.83 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 14 - Forks: 1

kelechi-c/ripple_net
image retrieval/tagging with CLIP
Language: Python - Size: 416 KB - Last synced at: 1 day ago - Pushed at: 10 months ago - Stars: 13 - Forks: 1

aimotive/mm_training
Multimodal model training on aiMotive Dataset
Language: Python - Size: 2.86 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 13 - Forks: 4

bryanbocao/open-papernotes
Yet another Ph.D. adventure.
Size: 1010 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 4

HackerHyper/CLIPMH
CLIPMH:CLIP Multi-modal Hashing
Language: Python - Size: 1.12 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 0

Neerajj9/Stacked-Attention-Networks-for-Visual-Question-Answering
Implementation of the paper "Stacked Attention Networks for Image Question Answering" in Tensorflow
Language: Python - Size: 15.3 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 13 - Forks: 4

ThomasHelfer/multimodal-supernovae
A codebase dedicated to exploring multimodal learning approaches by integrating images of host galaxies of supernovae and their corresponding light-curves and spectra.
Language: Jupyter Notebook - Size: 1.66 GB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 12 - Forks: 2

ZhaoPeiduo/BLIP2-Japanese
Modifying LAVIS' BLIP2 Q-former with models pretrained on Japanese datasets.
Language: Python - Size: 75.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 12 - Forks: 1

HySonLab/Protein_Pretrain
Multimodal Pretraining for Unsupervised Protein Representation Learning
Language: Python - Size: 241 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 12 - Forks: 2

Shen-Lab/CPAC
[Bioinformatics 2022] Cross-Modality and Self-Supervised Protein Embedding for Compound-Protein Affinity and Contact Prediction
Language: Python - Size: 134 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 12 - Forks: 1

Agora-X/DailyPaperClub
The repository for the exclusive Daily Paper Club hosted at Agora every 10pm NYC time at this discord: https://discord.gg/Gnzh6dnzyz
Size: 14.6 KB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 12 - Forks: 0

SmithaUpadhyaya/fashion_image_caption
Automate Fashion Image Captioning using BLIP-2. Automatic generating descriptions of clothes on shopping websites, which can help customers without fashion knowledge to better understand the features (attributes, style, functionality etc.) of the items and increase online sales by enticing more customers.
Language: Jupyter Notebook - Size: 26.6 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 1

clairecyq/whos-waldo
Who's Waldo? Linking People Across Text and Images. ICCV 2021.
Language: Python - Size: 2.86 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 12 - Forks: 4

SAGNIKMJR/move2hear-active-AV-separation
Code and datasets for 'Move2Hear: Active Audio-Visual Source Separation' (ICCV 2021)
Language: Python - Size: 1.31 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 0

dh1105/Multi-modal-movie-genre-prediction
A multi-modal deep learning model trained to predict a movie's genre given the movie poster and overview as an input.
Language: Jupyter Notebook - Size: 362 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 12 - Forks: 10

unitaryai/VTC
VTC: Improving Video-Text Retrieval with User Comments
Language: Python - Size: 5.45 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 11 - Forks: 0

rohit901/VANE-Bench
[NAACL'25] Contains code and documentation for our VANE-Bench paper.
Language: Python - Size: 38.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 1

yongfanbeta/awesome-multimodal-healthcare
Reading list for multimodal learning in healthcare
Size: 8.79 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 2

kritiksoman/Multimodal
Listen. Write. Speak. Read. Think.
Language: Jupyter Notebook - Size: 15.2 MB - Last synced at: 17 days ago - Pushed at: about 3 years ago - Stars: 11 - Forks: 0

JianqiangWan/VLPT-STD
Vision-Language Pre-Training for Boosting Scene Text Detectors (CVPR2022)
Size: 4.88 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 11 - Forks: 0

SAIC-MONTREAL/multimodal-dynamics
Code for AAAI 2021 paper "Learning Intuitive Physics with Multimodal Generative Models"
Language: Python - Size: 192 KB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 11 - Forks: 2

RunyuFan/UisNet-TGRS-2022
Code for TGRS 2022 paper "Fine-scale Urban Informal Settlements Mapping by Fusing Remote Sensing Images and Building Data via a Transformer-based Multimodal Fusion Network"
Language: Python - Size: 142 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 10 - Forks: 1

aimotive/aimotive-dataset-loader
Dataset loader and renderer for aiMotive Multimodal Dataset
Language: Python - Size: 614 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 10 - Forks: 2

zhongshsh/MoExtend
ACL 2024 (SRW), Official Codebase of our Paper: "MoExtend: Tuning New Experts for Modality and Task Extension"
Language: Python - Size: 542 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 10 - Forks: 0

katerynaCh/MMA-DFER
This repository provides the codes for MMA-DFER: multimodal (audiovisual) emotion recognition method. This is an official implementation for the paper MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild.
Language: Python - Size: 1.77 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 10 - Forks: 1

sisinflab/Formal-MultiMod-Rec
Formalizing Multimedia Recommendation through Multimodal Deep Learning, accepted in ACM Transactions on Recommender Systems.
Language: Python - Size: 903 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 10 - Forks: 1

guxm2021/MM_ALT
[MM 2022 Oral] MM-ALT: A Multimodal Automatic Lyric Transcription System
Language: Python - Size: 3.31 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 10 - Forks: 0

tomoyoshki/focal
Pytorch Implementation of FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space
Language: Python - Size: 59.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

ahmdtaha/distributed_sigmoid_loss
Unofficial implementation for Sigmoid Loss for Language Image Pre-Training
Language: Python - Size: 62.5 KB - Last synced at: about 1 hour ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

parham/lemanchot-analysis
LeManchot-Analysis is a system for abnormal detection in coupled visible-thermal images
Language: Python - Size: 79.9 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 2

willxxy/ECG-Bench
A Unified Framework for Benchmarking Generative Electrocardiogram-Language Models (ELMs)
Language: Python - Size: 6.78 MB - Last synced at: about 14 hours ago - Pushed at: about 15 hours ago - Stars: 9 - Forks: 2

canary-for-cognition/multimodal-dl-framework
An extensible PyTorch framework to experiment with neural-networks-based deep learning algorithms on multiple data modalities for binary classification.
Language: Python - Size: 2.22 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 9 - Forks: 3

monajalal/Kenyan-Food
code and link to the dataset for Kenyan Food detection paper accepted as a paper in MADiMA 2019 Workshop as part of ACM MM 2019 conference.
Language: Python - Size: 5.08 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 9 - Forks: 6

claws-lab/multimodal-robustness
Code and resources for EMNLP 2022 paper on 'Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions'
Language: Python - Size: 71.6 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 9 - Forks: 1

bairdxiong/SegResearchToolkit
A High-Efficient Research Development Toolkit for Image Segmentation Based on Pytorch.
Language: Python - Size: 3.11 MB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 0

SriramPingali/Multi-Modal-Recommendation-System
Official code for the paper "Towards developing a Multi Modal Video Recommendation system"
Language: Jupyter Notebook - Size: 942 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 1

MIDA-group/CoMIR_INSPIRE
Framework for Multimodal Deformable Image Registration. Coordinated equivariant representation learning (CoMIR) combined with robust deformable registration by INSPIRE.
Language: Python - Size: 9.9 MB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 1

theavicaster/featurehallucination-cgan
Uses C-GAN for feature hallucination of missing modalities for hyperspectral data. TensorFlow implementation of ICCV '19 paper
Language: Python - Size: 564 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 9 - Forks: 1

Rehan-Ahmad/MultimodalDiarization
Multimodal speaker diarization using pre-trained audio-visual synchronization model
Language: Python - Size: 38.1 KB - Last synced at: 9 months ago - Pushed at: almost 5 years ago - Stars: 9 - Forks: 6

akusayudodograu/Agentic-RAG-Story-Generation-with-Multimodal-GenAI
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
Size: 1000 Bytes - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 8 - Forks: 1

Duplums/CoMM
[ICLR 2025] Multi-modal representation learning of shared, unique and synergistic features between modalities
Language: Python - Size: 2.93 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 8 - Forks: 2

eezkni/M2Trans
[IEEE J-BHI-2024] Pytorch implementation of "M2Trans: Multi-Modal Regularized Coarse-to-Fine Transformer for Ultrasound Image Super-Resolution"
Language: Python - Size: 113 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 2

association-rosia/flair-2
Engage in a semantic segmentation challenge for land cover description using multimodal remote sensing earth observation data, delving into real-world scenarios with a dataset comprising 70,000+ aerial imagery patches and 50,000 Sentinel-2 satellite acquisitions.
Language: Jupyter Notebook - Size: 44.7 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0
