An open API service providing repository metadata for many open source software ecosystems.

Topic: "multimodal-deep-learning"

diegovalsesia/XMFnet

Code for "Cross-modal Learning for Image-Guided Point Cloud Shape Completion" (NeurIPS 2022)

Language: Python - Size: 22.2 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 34 - Forks: 6

yuhui-zh15/drml

Official Code Release for "Diagnosing and Rectifying Vision Models using Language" (ICLR 2023)

Language: Jupyter Notebook - Size: 19.2 MB - Last synced at: 10 days ago - Pushed at: almost 2 years ago - Stars: 33 - Forks: 0

visinf/lnfmm

Latent Normalizing Flows for Many-to-Many Cross Domain Mappings (ICLR 2020)

Language: Python - Size: 1000 KB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 33 - Forks: 12

zch42/BiFusion

Language: Python - Size: 2.08 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 32 - Forks: 9

JunweiLiang/FVTA_MemexQA

Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19

Language: Python - Size: 723 KB - Last synced at: 5 days ago - Pushed at: almost 6 years ago - Stars: 32 - Forks: 15

fraunhoferhhi/spvloc

[ECCV 2024 Oral] SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments

Language: Python - Size: 2.99 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 31 - Forks: 2

jaisidhsingh/LoRA-CLIP

Easy wrapper for inserting LoRA layers in CLIP.

Language: Python - Size: 60.5 KB - Last synced at: 6 days ago - Pushed at: 10 months ago - Stars: 31 - Forks: 3

declare-lab/MSA-Robustness

NAACL 2022 paper on Analyzing Modality Robustness in Multimodal Sentiment Analysis

Language: Python - Size: 3.43 MB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 31 - Forks: 5

usc-sail/mica-deep-mcca

Deep Multiset Canonical Correlation Analysis - An extension of CCA to multiple datasets

Language: Python - Size: 103 MB - Last synced at: 12 months ago - Pushed at: over 5 years ago - Stars: 31 - Forks: 14

phellonchen/awesome-visual-dialog

Recent Advances in Visual Dialog

Size: 36.1 KB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 30 - Forks: 1

IsaacRodgz/ConcatBERT

Baseline model for multimodal classification based on images and text. Text representation obtained from pretrained BERT base model and image representation obtained from VGG16 pretrained model.

Language: Jupyter Notebook - Size: 306 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 29 - Forks: 6

shubhamagarwal92/mmd

This repository contains the Pytorch implementation for our SCAI (EMNLP-2018) submission "A Knowledge-Grounded Multimodal Search-Based Conversational Agent"

Language: Python - Size: 82 KB - Last synced at: 11 months ago - Pushed at: almost 5 years ago - Stars: 29 - Forks: 5

DunnBC22/Vision_Audio_and_Multimodal_Projects

This repository includes all computer vision, audio, document AI, and multimodal projects.

Language: Jupyter Notebook - Size: 108 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 28 - Forks: 5

Nithin-GK/UniteandConquer

[CVPR '23] Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

Language: Python - Size: 6.55 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 28 - Forks: 3

declare-lab/MM-Align

[EMNLP 2022] This repository contains the official implementation of the paper "MM-Align: Learning Optimal Transport-based Alignment Dynamics for Fast and Accurate Inference on Missing Modality Sequences"

Language: Python - Size: 284 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 28 - Forks: 2

kyegomez/MultiModalCrossAttn

The open source implementation of the cross attention mechanism from the paper: "JOINTLY TRAINING LARGE AUTOREGRESSIVE MULTIMODAL MODELS"

Language: Python - Size: 223 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 1

emerisly/EDIS

Entity-Driven Image Search over Multimodal Web Content (EMNLP 2023)

Language: Python - Size: 1.61 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 27 - Forks: 0

david-yoon/attentive-modality-hopping-for-SER

TensorFlow implementation of "Attentive Modality Hopping for Speech Emotion Recognition," ICASSP-20

Language: Python - Size: 53.7 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 27 - Forks: 8

thuiar/UMC

Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances (ACL 2024)

Language: Python - Size: 1.89 MB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 25 - Forks: 3

georgesterpu/Taris

Transformer-based online speech recognition system with TensorFlow 2

Language: Python - Size: 5.4 MB - Last synced at: 12 months ago - Pushed at: over 4 years ago - Stars: 25 - Forks: 6

ksm26/Open-Source-Models-with-Hugging-Face

"Open Source Models with Hugging Face" course empowers you with the skills to leverage open-source models from the Hugging Face Hub for various tasks in NLP, audio, image, and multimodal domains.

Language: Jupyter Notebook - Size: 21 MB - Last synced at: 27 days ago - Pushed at: about 1 year ago - Stars: 24 - Forks: 19

nyukat/greedy_multimodal_learning

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

Language: Python - Size: 47.9 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 24 - Forks: 2

jiayuww/SpatialEval

[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs

Language: Python - Size: 3.95 MB - Last synced at: 19 days ago - Pushed at: 3 months ago - Stars: 23 - Forks: 0

UmarIgan/Machine-Learning

A set of jupyter notebooks

Language: Jupyter Notebook - Size: 16.7 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 23 - Forks: 8

HySonLab/Ligand_Generation

Target-aware Variational Auto-encoders for Ligand Generation with Multimodal Protein Representation Learning

Language: Python - Size: 257 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 23 - Forks: 2

sisinflab/Ducho

Python framework to extract multimodal features for multimodal recommendation in a highly-customizable way.

Language: Python - Size: 3.62 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 22 - Forks: 5

nesl/Robust-Deep-Learning-Pipeline

Deep Convolutional Bidirectional LSTM for Complex Activity Recognition with Missing Data. Human Activity Recognition Challenge. Springer SIST (2020)

Language: Jupyter Notebook - Size: 876 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 22 - Forks: 3

georgepar/slp

Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning

Language: Python - Size: 2.02 MB - Last synced at: 5 months ago - Pushed at: about 2 years ago - Stars: 21 - Forks: 7

UofLBioinformatics/circDeep

End-to-End learning framework for circular RNA classification from other long non-coding RNA using multimodal deep learning

Language: Python - Size: 47.2 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 21 - Forks: 14

AdrianBZG/HyperBERT

Code for "HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs" (EMNLP 2024)

Language: Python - Size: 26.4 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 20 - Forks: 0

cosmaadrian/multimodal-depression-from-video

Official source code for the paper: "Reading Between the Frames Multi-Modal Non-Verbal Depression Detection in Videos"

Language: Python - Size: 370 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 20 - Forks: 2

association-rosia/crop-forecasting

Predicting rice field yields through the integration of Microsoft Planetary satellite images, meteorological data, and field information in the 2023 EY Open Science Data Challenge - Crop Forecasting.

Language: Jupyter Notebook - Size: 341 MB - Last synced at: 13 days ago - Pushed at: about 1 year ago - Stars: 20 - Forks: 3

codezakh/DataEnvGym

A testbed for agents and environments that can automatically improve models through data generation.

Language: Python - Size: 9.16 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 19 - Forks: 5

sverma88/DeepCU-IJCAI19

DeepCU: Integrating Both Common and Unique Latent Information for Multimodal Sentiment Analysis, IJCAI-19

Language: Python - Size: 36.7 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 19 - Forks: 8

Yuan-ManX/ai-multimodal-timeline

Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D content. 🔥

Size: 1.11 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 18 - Forks: 1

declare-lab/M2H2-dataset

This repository contains the dataset and baselines explained in the paper: M2H2: A Multimodal Multiparty Hindi Dataset For HumorRecognition in Conversations

Language: Python - Size: 2.21 GB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 18 - Forks: 12

asnelt/mmae

Package for Multimodal Autoencoders in TensorFlow / Keras

Language: Python - Size: 28.3 KB - Last synced at: 17 days ago - Pushed at: almost 5 years ago - Stars: 18 - Forks: 12

yiren-jian/BLIText

[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

Language: Python - Size: 34.4 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 17 - Forks: 1

basiralab/MultiGraphGAN

MultiGraphGAN for predicting multiple target graphs from a source graph using geometric deep learning.

Language: Python - Size: 21.8 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 17 - Forks: 4

Nithin-Holla/meme_challenge

Repository containing code from team Kingsterdam for the Hateful Memes Challenge

Language: Python - Size: 1.36 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 17 - Forks: 8

frankaging/Multimodal-Transformer

Attention Based Multi-modal Emotion Recognition; Stanford Emotional Narratives Dataset

Language: Python - Size: 458 MB - Last synced at: 19 days ago - Pushed at: over 5 years ago - Stars: 17 - Forks: 1

ninibymilk/PMF-MMEA

[ACL2024] Progressively Modality Freezing for Multi-Modal Entity Alignment

Language: Python - Size: 551 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 16 - Forks: 0

kyegomez/Pegasus

PegasusX: The Future of Multimodal Embeddings 🦄 🦄

Language: Python - Size: 37.5 MB - Last synced at: 8 days ago - Pushed at: 6 months ago - Stars: 16 - Forks: 5

FuxiaoLiu/DocumentCLIP

[ICPRAI 2024] DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents

Language: Python - Size: 2.49 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 0

AmbiTyga/MemSem

A Multi-modal Framework for Sentimental Analysis of Meme

Language: Python - Size: 4.59 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 16 - Forks: 5

orrzohar/LOVM

[NeurIPS 2023] Official Pytorch code for LOVM: Language-Only Vision Model Selection

Language: Python - Size: 4.44 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 15 - Forks: 0

sarthak268/c3vqg-official

PyTorch Implementation for the paper "C3VQG: Category Consistent Cyclic Visual Question Generation" (ACM MM Asia'20).

Language: Python - Size: 63.9 MB - Last synced at: 12 months ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 6

eslambakr/LAR-Look-Around-and-Refer

This is the official implementation for our paper;"LAR:Look Around and Refer".

Language: C++ - Size: 45 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 2

gtatiya/Deep-Multi-Sensory-Object-Categorization

Deep Multi-Sensory Object Category Recognition Using Interactive Behavioral Exploration

Language: Jupyter Notebook - Size: 2.65 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 15 - Forks: 8

willxxy/ECG-Byte

[arxiv 2024] ECG-Byte: A Tokenizer for End-to-End Generative Electrocardiogram Language Modeling

Language: Python - Size: 27.5 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 14 - Forks: 0

LamineTourelab/MOGONET

MOGONET (Multi-Omics Graph cOnvolutional NETworks) is multi-omics data integrative analysis framework for classification tasks in biomedical applications.

Language: Jupyter Notebook - Size: 56.6 MB - Last synced at: 5 days ago - Pushed at: 28 days ago - Stars: 14 - Forks: 1

omeregev/click2mask

[AAAI 2025] Official Implementation for "Click2Mask: Local Editing with Dynamic Mask Generation" Paper.

Language: Python - Size: 61.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 14 - Forks: 2

sisinflab/LoG-2023-GNNs-RecSys

Presented as tutorial at the Second Learning on Graphs Conference (LoG 2023)

Language: Jupyter Notebook - Size: 12.9 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 0

ashutosh1919/data2vec-pytorch

Ready to run PyTorch implementation of Data2Vec 2.0: Highly efficient self-supervised representation learning for vision, speech and text.

Language: Python - Size: 116 KB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 14 - Forks: 2

ParitoshParmar/Piano-Skills-Assessment

Piano Skills Assessment [IEEE MMSP 2021]

Language: Python - Size: 852 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 2

PrithivirajDamodaran/vision-language-modelling-series

Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations

Language: Jupyter Notebook - Size: 6.15 MB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 4

vishaal27/Multimodal-Video-Emotion-Recognition-Pytorch

A Pytorch implementation of emotion recognition from videos

Language: Python - Size: 1.19 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 14 - Forks: 1

gchochla/Deep-Representations-of-Visual-Descriptions

Pytorch implementation of CVPR'16 paper "Learning Deep Representations of Fine-Grained Visual Descriptions", by Reed et al.

Language: Python - Size: 6.83 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 14 - Forks: 1

kelechi-c/ripple_net

image retrieval/tagging with CLIP

Language: Python - Size: 416 KB - Last synced at: 1 day ago - Pushed at: 10 months ago - Stars: 13 - Forks: 1

aimotive/mm_training

Multimodal model training on aiMotive Dataset

Language: Python - Size: 2.86 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 13 - Forks: 4

bryanbocao/open-papernotes

Yet another Ph.D. adventure.

Size: 1010 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 4

HackerHyper/CLIPMH

CLIPMH:CLIP Multi-modal Hashing

Language: Python - Size: 1.12 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 0

Neerajj9/Stacked-Attention-Networks-for-Visual-Question-Answering

Implementation of the paper "Stacked Attention Networks for Image Question Answering" in Tensorflow

Language: Python - Size: 15.3 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 13 - Forks: 4

ThomasHelfer/multimodal-supernovae

A codebase dedicated to exploring multimodal learning approaches by integrating images of host galaxies of supernovae and their corresponding light-curves and spectra.

Language: Jupyter Notebook - Size: 1.66 GB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 12 - Forks: 2

ZhaoPeiduo/BLIP2-Japanese

Modifying LAVIS' BLIP2 Q-former with models pretrained on Japanese datasets.

Language: Python - Size: 75.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 12 - Forks: 1

HySonLab/Protein_Pretrain

Multimodal Pretraining for Unsupervised Protein Representation Learning

Language: Python - Size: 241 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 12 - Forks: 2

Shen-Lab/CPAC

[Bioinformatics 2022] Cross-Modality and Self-Supervised Protein Embedding for Compound-Protein Affinity and Contact Prediction

Language: Python - Size: 134 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 12 - Forks: 1

Agora-X/DailyPaperClub

The repository for the exclusive Daily Paper Club hosted at Agora every 10pm NYC time at this discord: https://discord.gg/Gnzh6dnzyz

Size: 14.6 KB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 12 - Forks: 0

SmithaUpadhyaya/fashion_image_caption

Automate Fashion Image Captioning using BLIP-2. Automatic generating descriptions of clothes on shopping websites, which can help customers without fashion knowledge to better understand the features (attributes, style, functionality etc.) of the items and increase online sales by enticing more customers.

Language: Jupyter Notebook - Size: 26.6 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 1

clairecyq/whos-waldo

Who's Waldo? Linking People Across Text and Images. ICCV 2021.

Language: Python - Size: 2.86 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 12 - Forks: 4

SAGNIKMJR/move2hear-active-AV-separation

Code and datasets for 'Move2Hear: Active Audio-Visual Source Separation' (ICCV 2021)

Language: Python - Size: 1.31 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 0

dh1105/Multi-modal-movie-genre-prediction

A multi-modal deep learning model trained to predict a movie's genre given the movie poster and overview as an input.

Language: Jupyter Notebook - Size: 362 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 12 - Forks: 10

unitaryai/VTC

VTC: Improving Video-Text Retrieval with User Comments

Language: Python - Size: 5.45 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 11 - Forks: 0

rohit901/VANE-Bench

[NAACL'25] Contains code and documentation for our VANE-Bench paper.

Language: Python - Size: 38.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 1

yongfanbeta/awesome-multimodal-healthcare

Reading list for multimodal learning in healthcare

Size: 8.79 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 2

kritiksoman/Multimodal

Listen. Write. Speak. Read. Think.

Language: Jupyter Notebook - Size: 15.2 MB - Last synced at: 17 days ago - Pushed at: about 3 years ago - Stars: 11 - Forks: 0

JianqiangWan/VLPT-STD

Vision-Language Pre-Training for Boosting Scene Text Detectors (CVPR2022)

Size: 4.88 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 11 - Forks: 0

SAIC-MONTREAL/multimodal-dynamics

Code for AAAI 2021 paper "Learning Intuitive Physics with Multimodal Generative Models"

Language: Python - Size: 192 KB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 11 - Forks: 2

RunyuFan/UisNet-TGRS-2022

Code for TGRS 2022 paper "Fine-scale Urban Informal Settlements Mapping by Fusing Remote Sensing Images and Building Data via a Transformer-based Multimodal Fusion Network"

Language: Python - Size: 142 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 10 - Forks: 1

aimotive/aimotive-dataset-loader

Dataset loader and renderer for aiMotive Multimodal Dataset

Language: Python - Size: 614 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 10 - Forks: 2

zhongshsh/MoExtend

ACL 2024 (SRW), Official Codebase of our Paper: "MoExtend: Tuning New Experts for Modality and Task Extension"

Language: Python - Size: 542 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 10 - Forks: 0

katerynaCh/MMA-DFER

This repository provides the codes for MMA-DFER: multimodal (audiovisual) emotion recognition method. This is an official implementation for the paper MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild.

Language: Python - Size: 1.77 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 10 - Forks: 1

sisinflab/Formal-MultiMod-Rec

Formalizing Multimedia Recommendation through Multimodal Deep Learning, accepted in ACM Transactions on Recommender Systems.

Language: Python - Size: 903 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 10 - Forks: 1

guxm2021/MM_ALT

[MM 2022 Oral] MM-ALT: A Multimodal Automatic Lyric Transcription System

Language: Python - Size: 3.31 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 10 - Forks: 0

tomoyoshki/focal

Pytorch Implementation of FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space

Language: Python - Size: 59.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

ahmdtaha/distributed_sigmoid_loss

Unofficial implementation for Sigmoid Loss for Language Image Pre-Training

Language: Python - Size: 62.5 KB - Last synced at: about 1 hour ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

parham/lemanchot-analysis

LeManchot-Analysis is a system for abnormal detection in coupled visible-thermal images

Language: Python - Size: 79.9 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 2

willxxy/ECG-Bench

A Unified Framework for Benchmarking Generative Electrocardiogram-Language Models (ELMs)

Language: Python - Size: 6.78 MB - Last synced at: about 14 hours ago - Pushed at: about 15 hours ago - Stars: 9 - Forks: 2

canary-for-cognition/multimodal-dl-framework

An extensible PyTorch framework to experiment with neural-networks-based deep learning algorithms on multiple data modalities for binary classification.

Language: Python - Size: 2.22 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 9 - Forks: 3

monajalal/Kenyan-Food

code and link to the dataset for Kenyan Food detection paper accepted as a paper in MADiMA 2019 Workshop as part of ACM MM 2019 conference.

Language: Python - Size: 5.08 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 9 - Forks: 6

claws-lab/multimodal-robustness

Code and resources for EMNLP 2022 paper on 'Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions'

Language: Python - Size: 71.6 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 9 - Forks: 1

bairdxiong/SegResearchToolkit

A High-Efficient Research Development Toolkit for Image Segmentation Based on Pytorch.

Language: Python - Size: 3.11 MB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 0

SriramPingali/Multi-Modal-Recommendation-System

Official code for the paper "Towards developing a Multi Modal Video Recommendation system"

Language: Jupyter Notebook - Size: 942 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 1

MIDA-group/CoMIR_INSPIRE

Framework for Multimodal Deformable Image Registration. Coordinated equivariant representation learning (CoMIR) combined with robust deformable registration by INSPIRE.

Language: Python - Size: 9.9 MB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 1

theavicaster/featurehallucination-cgan

Uses C-GAN for feature hallucination of missing modalities for hyperspectral data. TensorFlow implementation of ICCV '19 paper

Language: Python - Size: 564 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 9 - Forks: 1

Rehan-Ahmad/MultimodalDiarization

Multimodal speaker diarization using pre-trained audio-visual synchronization model

Language: Python - Size: 38.1 KB - Last synced at: 9 months ago - Pushed at: almost 5 years ago - Stars: 9 - Forks: 6

akusayudodograu/Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling

Size: 1000 Bytes - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 8 - Forks: 1

Duplums/CoMM

[ICLR 2025] Multi-modal representation learning of shared, unique and synergistic features between modalities

Language: Python - Size: 2.93 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 8 - Forks: 2

eezkni/M2Trans

[IEEE J-BHI-2024] Pytorch implementation of "M2Trans: Multi-Modal Regularized Coarse-to-Fine Transformer for Ultrasound Image Super-Resolution"

Language: Python - Size: 113 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 2

association-rosia/flair-2

Engage in a semantic segmentation challenge for land cover description using multimodal remote sensing earth observation data, delving into real-world scenarios with a dataset comprising 70,000+ aerial imagery patches and 50,000 Sentinel-2 satellite acquisitions.

Language: Jupyter Notebook - Size: 44.7 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

Related Topics
deep-learning 105 multimodal 86 pytorch 64 computer-vision 56 machine-learning 47 multimodal-learning 38 natural-language-processing 26 nlp 24 multimodality 22 vision-and-language 21 tensorflow 20 large-language-models 19 python 18 transformer 16 attention-mechanism 16 transformers 16 generative-ai 14 multimodal-sentiment-analysis 14 artificial-intelligence 14 gpt4 13 llm 13 multimodal-large-language-models 13 deep-neural-networks 13 self-supervised-learning 12 classification 12 convolutional-neural-networks 11 dataset 11 emotion-recognition 11 visual-question-answering 11 attention 10 multimodal-datasets 10 neural-network 10 ai 9 image-processing 9 clip 9 awesome-list 8 time-series 8 vision-language-transformer 8 image 8 object-detection 8 language-model 8 sentiment-analysis 8 bert 8 vision-transformer 8 multimodal-data 7 image-captioning 7 vision-language 7 image-classification 7 multimodal-fusion 7 vision-language-model 7 diffusion-models 7 multimodal-representation 7 representation-learning 7 pytorch-lightning 7 cnn 7 reinforcement-learning 6 deeplearning 6 3d 6 huggingface-transformers 6 foundation-models 6 text-to-image 6 graph-neural-networks 6 keras 6 lstm 6 remote-sensing 6 neural-networks 6 vision-language-pretraining 6 audio-processing 5 transformer-models 5 speech-recognition 5 text 5 recommender-system 5 variational-autoencoder 5 anomaly-detection 5 image-generation 5 data-fusion 5 generative-adversarial-network 5 transfer-learning 5 generative-model 5 attention-is-all-you-need 5 python3 5 audio 5 vqa 5 visual-grounding 5 large-multimodal-models 5 point-cloud 5 gan 5 embeddings 5 nlp-machine-learning 5 contrastive-learning 5 question-answering 5 multimodal-interactions 5 paper 5 memes 5 semantic-segmentation 5 visual-dialog 4 multi-modal 4 domain-adaptation 4 knowledge-graph 4 cross-modal-retrieval 4