GitHub topics: multimodal-deep-learning
kyegomez/Gen2
Implementation of "Text driven video generation" in pytorch
Language: Python - Size: 222 KB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

kyegomez/CELESTIAL-1
Omni-Modality Processing, Understanding, and Generation
Language: Python - Size: 2.49 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 1

ramakrishnan2503/LearnMate-2.0
Personalized learning companion (Updated version of LearnMate).
Language: Jupyter Notebook - Size: 146 KB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

DistilledCode/mmrl
Multi-Modal Representational Learning for Social Media Popularity Prediction
Language: Python - Size: 27.3 KB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

slinusc/path-vqa-blip
Fine-tuning BLIP for pathological visual question answering.
Language: Jupyter Notebook - Size: 80.1 KB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 1

bairdxiong/SegResearchToolkit
A High-Efficient Research Development Toolkit for Image Segmentation Based on Pytorch.
Language: Python - Size: 3.11 MB - Last synced at: 5 months ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 0

kaledhoshme123/Multimodal-face-generation-facial-biometrics-
Similarity between faces: One person resembles another person to a large degree. This can lead to many problems facing security surveillance systems. Facial recognition systems have difficulty distinguishing between the main person and other people who are highly similar in terms of features.
Language: Jupyter Notebook - Size: 16.8 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

a-tabaza/fairouz_demo
Demo for Binding Text, Images, Graphs, and Audio for Music Representation Learning
Language: Python - Size: 27.8 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

XavierSpycy/MultiCLIP
MultiCLIP: A framework for multimodal-multilabel-multistage classification utilizing advanced pretrained models like CLIP and BLIP. 一个多模态多标签多阶段分类框架,利用像CLIP和BLIP这样的先进预训练模型。
Language: Python - Size: 2.2 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

eftekhar-hossain/Bengali-Hateful-Memes
[ACL, EACL'24] Multimodal Hate Speech Detection in Bengali
Language: Python - Size: 2.68 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 1

VisualWebBench/VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
Language: Python - Size: 3.17 MB - Last synced at: 11 months ago - Pushed at: 12 months ago - Stars: 36 - Forks: 1

darmangerd/vubot
Multimodal Computer Vision application leveraging object detections, gesture recognition and speech to text, in order to help user ask questions about their environment.
Language: Python - Size: 63.5 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

DunnBC22/Vision_Audio_and_Multimodal_Projects
This repository includes all computer vision, audio, document AI, and multimodal projects.
Language: Jupyter Notebook - Size: 108 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 28 - Forks: 5

darrylnurse/viewvie
Movie detection application.
Language: JavaScript - Size: 123 MB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 1

Shen-Lab/CPAC
[Bioinformatics 2022] Cross-Modality and Self-Supervised Protein Embedding for Compound-Protein Affinity and Contact Prediction
Language: Python - Size: 134 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 12 - Forks: 1

DongmingShenDS/Multi-Modal-ML-Project
A data science project to predict online pet adoption speed using image, natural language, and tabular data with a multi-modal ML framework.
Language: Jupyter Notebook - Size: 17.4 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 0

MichiganNLP/visual_diversity_budget
Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost
Size: 2.24 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 7 - Forks: 0

ViLab-UCSD/LaGTran_ICML2024
Code and models for the ICML 2024 paper "Tell, Don`t Show!: Language Guidance Eases Transfer Across Domains in Images and Videos"
Language: Python - Size: 151 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

HySonLab/Ligand_Generation
Target-aware Variational Auto-encoders for Ligand Generation with Multimodal Protein Representation Learning
Language: Python - Size: 257 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 23 - Forks: 2

cosmaadrian/multimodal-depression-from-video
Official source code for the paper: "Reading Between the Frames Multi-Modal Non-Verbal Depression Detection in Videos"
Language: Python - Size: 370 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 20 - Forks: 2

florencejt/fusilli
A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸
Language: Python - Size: 987 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 146 - Forks: 12

MILVLG/prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
Language: Python - Size: 1.09 MB - Last synced at: 12 months ago - Pushed at: almost 2 years ago - Stars: 259 - Forks: 27

usc-sail/mica-context-emotion-recognition
Repository for context based emotion recognition
Language: Python - Size: 45.9 KB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

rajlm10/Shoppster
Multimodal Shopping Assistant
Language: Jupyter Notebook - Size: 10 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

sisinflab/Formal-MultiMod-Rec
Formalizing Multimedia Recommendation through Multimodal Deep Learning, accepted in ACM Transactions on Recommender Systems.
Language: Python - Size: 903 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 10 - Forks: 1

N-G-Asker/TasteRank
TasteRank: Personalized Image Search and Recommendation. This research project proposes an AI-based method for scoring photos on relevance to user interests. TasteRank leverages language and vision models, including Mistral LLMs and OpenAI’s CLIP, and applies multimodal machine-learning techniques.
Language: Jupyter Notebook - Size: 3.26 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Sreyan88/MMER
Code for the InterSpeech 2023 paper: MMER: Multimodal Multi-task learning for Speech Emotion Recognition
Language: Python - Size: 1.59 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 54 - Forks: 14

GeorgeTouros/video-soundtrack-evaluation
Create a large, well-managed and clean data-set for the task of music composition for video soundtracks.
Language: Jupyter Notebook - Size: 32.2 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

parham/lemanchot-analysis
LeManchot-Analysis is a system for abnormal detection in coupled visible-thermal images
Language: Python - Size: 79.9 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 2

choyingw/GAIS-Net
CVPR 2020 Workshop on Scalability in Autonomous Driving: GAIS-Net: Geometry-Aware Instance Segmentation with Disparity Maps
Language: Python - Size: 950 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

affjljoo3581/Job-Recommend-Competition
🥇KNOW기반 직업 추천 알고리즘 경진대회 1등 솔루션입니다🥇
Language: Python - Size: 1.74 MB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 43 - Forks: 4

JHKim-snu/GVCCI
[IROS 2023] GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation
Language: Python - Size: 27.4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 0

Anas1108/Multimodal_Memes_Classification
Build a PyTorch-based multimodal architecture to classify memes using image & caption. Trained on a meme classification dataset, MLP architecture uses PyTorch, Numpy, Matplotlib, & Sklearn to achieve improved performance compared to baselines.
Language: Jupyter Notebook - Size: 851 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

yiren-jian/BLIText
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Language: Python - Size: 34.4 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 17 - Forks: 1

usc-sail/mica-deep-mcca
Deep Multiset Canonical Correlation Analysis - An extension of CCA to multiple datasets
Language: Python - Size: 103 MB - Last synced at: 12 months ago - Pushed at: over 5 years ago - Stars: 31 - Forks: 14

Agora-X/DailyPaperClub
The repository for the exclusive Daily Paper Club hosted at Agora every 10pm NYC time at this discord: https://discord.gg/Gnzh6dnzyz
Size: 14.6 KB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 0

nyukat/greedy_multimodal_learning
Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks
Language: Python - Size: 47.9 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 24 - Forks: 2

emerisly/EDIS
Entity-Driven Image Search over Multimodal Web Content (EMNLP 2023)
Language: Python - Size: 1.61 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 27 - Forks: 0

YeonwooSung/LIMoE-pytorch
PyTorch implementation of LIMoE
Language: Python - Size: 4.3 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 43 - Forks: 1

Nithin-GK/UniteandConquer
[CVPR '23] Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models
Language: Python - Size: 6.55 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 28 - Forks: 3

david-yoon/multimodal-speech-emotion
TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18
Language: Jupyter Notebook - Size: 238 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 239 - Forks: 70

AdrianBZG/SFAVEL
Code for "Unsupervised Pretraining for Fact Verification by Language Model Distillation" (ICLR 2024)
Language: Python - Size: 14.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 0

sisinflab/LoG-2023-GNNs-RecSys
Presented as tutorial at the Second Learning on Graphs Conference (LoG 2023)
Language: Jupyter Notebook - Size: 12.9 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 0

claws-lab/multimodal-robustness
Code and resources for EMNLP 2022 paper on 'Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions'
Language: Python - Size: 71.6 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 9 - Forks: 1

isevr/TVEmotion
A novel multimodal approach for emotion recognition deploying early fusion based on graph-captured embeddings
Language: Jupyter Notebook - Size: 164 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

imatge-upc/wav2pix Fork of miqueltubau/Wav2Pix
Speech-conditioned face generation using Generative Adversarial Networks (ICASSP 2019)
Language: Python - Size: 202 MB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 55 - Forks: 24

nesl/Robust-Deep-Learning-Pipeline
Deep Convolutional Bidirectional LSTM for Complex Activity Recognition with Missing Data. Human Activity Recognition Challenge. Springer SIST (2020)
Language: Jupyter Notebook - Size: 876 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 22 - Forks: 3

Cominclip/RPF-Net
Official code for "Recurrent Progressive Fusion-based Learning for Multi-source Remote Sensing Image Classification"
Language: Python - Size: 319 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

SRDdev/OpenAI-CLIP
Simple Educational Implementation of OpenAI CLIP in PyTorch
Language: Jupyter Notebook - Size: 6.63 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

lmb-freiburg/Multimodal-Future-Prediction
The official repository for the CVPR 2019 paper "Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction"
Language: Python - Size: 21.6 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 47 - Forks: 8

eliottcrancee/ParoleNet
Utilizing a multimodal architecture to predict the appropriate speaker turn in a dialogue.
Language: Python - Size: 178 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ishitab1310/HateFilter
Analyzing Hateful Memes/ (Resources:- Hateful Memes Challenge)
Language: Jupyter Notebook - Size: 3.09 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

thatAverageGuy/EarlyFusion-on-EasyVQA
Streamlit app for demonstrating multi-modal(vision+language) modelling in Pytorch.
Language: Python - Size: 2.74 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

georgepar/slp
Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning
Language: Python - Size: 2.02 MB - Last synced at: 6 months ago - Pushed at: about 2 years ago - Stars: 21 - Forks: 7

davide-coccomini/Deepfake-Detection-Challenge-DFAD2023
Implementation of the winning solution for the Media Analytics Challenge 2023.
Language: Jupyter Notebook - Size: 41.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 0

AnkurDeria/MFT
Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.
Language: Jupyter Notebook - Size: 2.13 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 130 - Forks: 8

association-rosia/flair-2
Engage in a semantic segmentation challenge for land cover description using multimodal remote sensing earth observation data, delving into real-world scenarios with a dataset comprising 70,000+ aerial imagery patches and 50,000 Sentinel-2 satellite acquisitions.
Language: Jupyter Notebook - Size: 44.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

jena-shreyas/Awesome-Video-Language-Resources
A repository of Video Language papers, code and datasets.
Size: 7.81 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ihaeyong/drama-graph
Drama-Graph repository produces both knowledge base on drama scripts and video graph for Video Turing Test (VTT).
Language: Jupyter Notebook - Size: 201 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

DavidHuji/CapDec
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
Language: Python - Size: 35.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 158 - Forks: 17

orrzohar/LOVM
[NeurIPS 2023] Official Pytorch code for LOVM: Language-Only Vision Model Selection
Language: Python - Size: 4.44 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 15 - Forks: 0

nicolafan/neural-artwork-caption-generator
Code for the paper "Exploring the Synergy Between Vision-Language Pretraining and ChatGPT for Artwork Captioning: A Preliminary Study"
Language: Jupyter Notebook - Size: 130 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

arusl/tmdb-multimodal-inference
This repo contains a Jupyter notebook showing how to run a prediction of new data using a multimodal deep learning model to predict movie genres.
Language: Jupyter Notebook - Size: 1.73 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 1

tomoyoshki/focal
Pytorch Implementation of FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space
Language: Python - Size: 59.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

sarthak268/c3vqg-official
PyTorch Implementation for the paper "C3VQG: Category Consistent Cyclic Visual Question Generation" (ACM MM Asia'20).
Language: Python - Size: 63.9 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 6

aimotive/aimotive-dataset-loader
Dataset loader and renderer for aiMotive Multimodal Dataset
Language: Python - Size: 614 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 10 - Forks: 2

deeplsd/Syncnet_Analysis
This code is part of the paper: "A Deep Dive Into Neural Synchrony Evaluation for Audio-visual Translation" published at ACM ICMI 2022.
Language: Python - Size: 57.9 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

PrithivirajDamodaran/vision-language-modelling-series
Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations
Language: Jupyter Notebook - Size: 6.15 MB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 4

shubhamagarwal92/mmd
This repository contains the Pytorch implementation for our SCAI (EMNLP-2018) submission "A Knowledge-Grounded Multimodal Search-Based Conversational Agent"
Language: Python - Size: 82 KB - Last synced at: 11 months ago - Pushed at: almost 5 years ago - Stars: 29 - Forks: 5

gustavocidornelas/fused-multimodal-emotion
Multimodal emotion recognition using lexico-acoustic language descriptions
Language: Python - Size: 37.3 MB - Last synced at: 20 days ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 1

talipucar/talipucar.github.io_old
Showcases ongoing, and completed projects within various research themes.
Size: 8.51 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

diegovalsesia/XMFnet
Code for "Cross-modal Learning for Image-Guided Point Cloud Shape Completion" (NeurIPS 2022)
Language: Python - Size: 22.2 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 34 - Forks: 6

abs711/Visual-Control
Deep Learning models to fuse imu-based motion capture and first-person video data to improve the prediction of future knee and ankle joint kinematics, in complex real-world environments.
Language: Python - Size: 75.1 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

stevejpapad/image-text-verification
Official repository for the "VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias" paper.
Language: Python - Size: 11.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

giganttheo/tib-dataset
Dataset for abstractive summarization of long multimodal presentations
Size: 1.95 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

vklinhhh/Video-Event-Retrieval
The Video Event Retrieval Project for Vietnamese News facilitates the precise extraction of events from video archives through content analysis and indexing of Vietnamese news videos.
Language: Python - Size: 26.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

XavierSpycy/CAT-ImageTextIntegrator
An innovative deep learning framework leveraging the CAT (Convolutions, Attention & Transformers) architecture to seamlessly integrate visual and textual modalities. This model exploits the prowess of CNNs for image feature extraction and Transformers for intricate textual pattern recognition, setting a new paradigm in multimodal learning.
Language: Python - Size: 8.21 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

ellenzhuwang/implicit_vkood
Implicit Out-Of-Distribution detection in multimodal analysis (NeurIPS23)
Language: Python - Size: 252 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

bryanbocao/open-papernotes
Yet another Ph.D. adventure.
Size: 1010 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 4

YuanGongND/cav-mae
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
Language: Python - Size: 12.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 151 - Forks: 13

michelecafagna26/vl-shap
[Frontiers in AI Journal] Implementation of the paper "Interpreting Vision and Language Generative Models with Semantic Visual Priors"
Language: Jupyter Notebook - Size: 10.6 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

SheezaShabbir/Multimodel_huggingFace-Swin-Transformer
A multimodal that uses both text and Images to tells what will be the expected emotion of the viewer of the news.
Language: Jupyter Notebook - Size: 110 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

YuxingLu613/HTML
Code for paper Multiomics dynamic learning enables personalized diagnosis and prognosis for pan-cancer and cancer-subtypes
Language: Python - Size: 43.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

guxm2021/MM_ALT
[MM 2022 Oral] MM-ALT: A Multimodal Automatic Lyric Transcription System
Language: Python - Size: 3.31 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 10 - Forks: 0

elsobhano/Multimodal-Emotion-Recognition
Multimodal Emotion Recognition using ClipBERT.
Language: Python - Size: 880 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

idearibosome/embracenet
Robust multimodal integration method implemented in PyTorch and TensorFlow
Language: Python - Size: 107 KB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 78 - Forks: 25

gorjanradevski/SMHA
My master thesis: Siamese multi-hop attention for cross-modal retrieval.
Language: Python - Size: 2.76 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 0

marslanm/Multimodality-Representation-Learning
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
Size: 63.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 51 - Forks: 7

Ikea-179/Hateful-Meme-Detection
A Multimodal Deep-Learning-based Project aimed to classify whether the given meme is hateful or not
Language: Jupyter Notebook - Size: 14.4 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

akashe/Multimodal-action-recognition
Code on selecting an action based on multimodal inputs. Here in this case inputs are voice and text.
Language: Python - Size: 64.7 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 69 - Forks: 11

saramaxyz/platform
Run custom multi-modal AI models fully on-device
Language: Swift - Size: 14.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

ofa-x/OFA-X
This repository contains the code for the publication "Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations"
Language: Python - Size: 84.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 0

vijayvee/video-captioning
This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.
Language: Python - Size: 3.39 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 162 - Forks: 65

Dynamo13/AW-Net
[ICCVw 2023] "AW-Net: A Novel Fully Connected Attention-based Medical Image Segmentation Model" by Debojyoti Pal, Tanushree Meena, Dwarikanath Mahapatra, and Sudipta Roy.
Language: Python - Size: 3.2 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 2

anita-hu/MSAF
Offical implementation of paper "MSAF: Multimodal Split Attention Fusion"
Language: Python - Size: 101 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 59 - Forks: 9

SriramPingali/Multi-Modal-Recommendation-System
Official code for the paper "Towards developing a Multi Modal Video Recommendation system"
Language: Jupyter Notebook - Size: 942 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 1

comp-well-org/More2Less
More to Less (M2L): Enhanced Health Recognition in the Wild with Reduced Modality of Wearable Sensors
Language: Python - Size: 1.03 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

samyuh/seadronessee-metadata-adaptation
Exploring Metadata in Neural Networks for UAV Maritime Surveillance
Language: Jupyter Notebook - Size: 34.5 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

duyali2000/MQMC
This repo has the PyTorch implementation and datasets of our WSDM 2023 paper: “Multi-queue Momentum Contrast for Microvideo-Product Retrieval”.
Language: Python - Size: 1.97 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 1

referit3d/referit3d
Code accompanying our ECCV-2020 paper on 3D Neural Listeners.
Language: C++ - Size: 15.8 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 81 - Forks: 13
