An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: multimodal-deep-learning

eftekhar-hossain/Multimodal-Sentiment-LREC2022

This repository contains the relevant materials of the LREC-22 paper.

Language: Jupyter Notebook - Size: 5.56 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

UofLBioinformatics/circDeep

End-to-End learning framework for circular RNA classification from other long non-coding RNA using multimodal deep learning

Language: Python - Size: 47.2 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 21 - Forks: 14

cap-ntu/Video-to-Retail-Platform

An intelligent multimodal-learning based system for video, product and ads analysis. Based on the system, people can build a lot of downstream applications such as product recommendation, video retrieval, etc.

Language: Python - Size: 65.7 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 138 - Forks: 43

XavierSpycy/Deep-Learning

Deep learning projects

Language: Jupyter Notebook - Size: 51.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

penghu-cs/MRL

Learning Cross-Modal Retrieval with Noisy Labels (CVPR 2021, PyTorch Code)

Language: Python - Size: 23.9 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 44 - Forks: 10

talipucar/DomainAdaptation

A model for Domain Adaptation, Alignment and Translation using multiple sources of data.

Language: Python - Size: 46.5 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 1

liuzwin98/DSCMT

code released

Language: Python - Size: 64.5 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

iamdanialkamali/MemotionAnalysis

Meme Sentiment Analysis SemEval 2020 Task 9

Language: Jupyter Notebook - Size: 1.93 MB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 0

IsaacRodgz/ConcatBERT

Baseline model for multimodal classification based on images and text. Text representation obtained from pretrained BERT base model and image representation obtained from VGG16 pretrained model.

Language: Jupyter Notebook - Size: 306 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 29 - Forks: 6

husseinmozannar/multimodal-deep-learning-for-disaster-response

Damage Identification in Social Media Posts using Multimodal Deep Learning: code and dataset

Language: Python - Size: 62.5 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 43 - Forks: 16

Merterm/COSMic

Public repo for the paper: "COSMic: A Coherence-Aware Generation Metric for Image Descriptions" by Mert İnan, Piyush Sharma, Baber Khalid, Radu Soricut, Matthew Stone, Malihe Alikhani

Language: Python - Size: 396 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 0

HackerHyper/CLIPMH

CLIPMH:CLIP Multi-modal Hashing

Language: Python - Size: 1.12 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 0

Etienne-bobo/Skimlit-Nlp

The purpose of this project is to build an NLP model to make reading medical abtracts easier.

Language: Jupyter Notebook - Size: 1.95 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

shOh-ai/Personalized_Emotion-Analysis_using_Multi-modal_DL

2023 1st semester -BigDataProject Team Project Page

Language: Jupyter Notebook - Size: 233 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 2

SmithaUpadhyaya/fashion_image_caption

Automate Fashion Image Captioning using BLIP-2. Automatic generating descriptions of clothes on shopping websites, which can help customers without fashion knowledge to better understand the features (attributes, style, functionality etc.) of the items and increase online sales by enticing more customers.

Language: Jupyter Notebook - Size: 26.6 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 12 - Forks: 1

marcomoldovan/multimodal-self-distillation

A generalized self-supervised training paradigm for unimodal and multimodal alignment and fusion.

Language: Python - Size: 526 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 2

abhishekpaul11/Disturbance-Detection-MIM

Instant Messaging App built on React Native with backend deployed on AWS DynamoDB and S3 using AWS Amplify (API Querying handled by GraphQL). The distracting content is filtered with the help of a Multi-modal Deep Learning architecture hosted on an AWS EC2 instance.

Language: Jupyter Notebook - Size: 40.4 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 1

bupt-mmai/S2TD

code for "S2TD: A Tree-Structured Decoder for Image Paragraph Captioning" accepted by MMAsia 2021

Language: Python - Size: 68.7 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 0

clairecyq/whos-waldo

Who's Waldo? Linking People Across Text and Images. ICCV 2021.

Language: Python - Size: 2.86 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 12 - Forks: 4

benoriol/memes_processing

Language: Python - Size: 41 KB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 6

gchochla/Deep-Representations-of-Visual-Descriptions

Pytorch implementation of CVPR'16 paper "Learning Deep Representations of Fine-Grained Visual Descriptions", by Reed et al.

Language: Python - Size: 6.83 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 14 - Forks: 1

HWH-2000/Awesome-paper-for-multimodal

record some related papers on multimodality

Size: 14.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

vijayvee/text-to-image-synthesis Fork of artifacia/text-to-image-synthesis

Project to transform a natural language description into an image using Generative Adversarial Networks.

Language: Python - Size: 68.4 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1

mobled37/utils

Deeplearning utils for multimodal research

Language: Python - Size: 2.93 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

soloist97/densecap-pytorch

A simplified pytorch version of densecap

Language: Jupyter Notebook - Size: 5.1 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 34 - Forks: 8

zch42/BiFusion

Language: Python - Size: 2.08 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 32 - Forks: 9

AndreiMoraru123/ContextCollector

Mixed vision-language Attention Model that gets better by making mistakes

Language: Python - Size: 149 MB - Last synced at: about 10 hours ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

efthymisgeo/multimodal-masking

This repo contains source code for the MultiModal Masking (M^3) Interspeech 2021 paper.

Language: Python - Size: 1.91 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 0

nicolopinci/deepgravilens

Language: Python - Size: 2.97 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

selamgit/EfficientNet_for_Endoscopic_Images_Response_Prediction

PyTorch implementation of EfficientNet for response prediction

Language: Jupyter Notebook - Size: 11.8 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

vishaal27/Multimodal-Video-Emotion-Recognition-Pytorch

A Pytorch implementation of emotion recognition from videos

Language: Python - Size: 1.19 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 14 - Forks: 1

TIBHannover/MM_Claims

Official code repository for the paper: Gullal Singh Cheema, Sherzod Hakimov, Abdul Sittar, Eric Müller-Budack, Christian Otto, and Ralph Ewerth. 2022. “MM-Claims: A Dataset for Multimodal Claim Detection in Social Media.“ In Findings of the Association for Computational Linguistics: NAACL 2022, pages 962–979, Seattle, United States.

Language: Python - Size: 42.3 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 2

georgesterpu/Taris

Transformer-based online speech recognition system with TensorFlow 2

Language: Python - Size: 5.4 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 25 - Forks: 6

danadascalescu00/MultimodalOpinionAnalysis 📦

Bachelor Thesis: Opinion Polarity Classification - Given a tweet consisting of an image and text, classify the post on three-point scale

Language: Python - Size: 101 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

AmbiTyga/MemSem

A Multi-modal Framework for Sentimental Analysis of Meme

Language: Python - Size: 4.59 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 16 - Forks: 5

annkamsk/mvae

Multimodal Variational Autoencoder dedicated to omics data integration

Language: Jupyter Notebook - Size: 544 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

HLTCHKUST/VG-GPLMs

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Language: Python - Size: 9.32 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 49 - Forks: 8

fabiopernisi/Visual-WSD

This repository contains the code for our solution to the Task 1 of the 17th international workshop about Semantic Evaluation (SemEval-2023)

Language: Jupyter Notebook - Size: 9.35 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

PrachiJainxD/AmbientAI_IMU2CLIP

COMPSCI 696DS Industry Mentorship Program with Meta Reality Labs: Ambient AI: Multimodal Wearable Sensor Understanding (Experiments in Distilling Knowledge in Cross-Modal Contrastive Learning.)

Language: Python - Size: 23.1 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

VityaVitalich/IMAD

IMAD: IMage Augmented multi-modal Dialogue

Language: Python - Size: 1.11 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

UsefGamal/Visual-Question-Answering-VQA

A Multimodal project in which a vision model used to understand images concatenated with NLP model to understand questions in order to provide answers based on both questions and images

Language: Jupyter Notebook - Size: 2.19 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

theavicaster/featurehallucination-cgan

Uses C-GAN for feature hallucination of missing modalities for hyperspectral data. TensorFlow implementation of ICCV '19 paper

Language: Python - Size: 564 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 9 - Forks: 1

verlab/StraightToThePoint_CVPR_2020

Original PyTorch implementation of the code for the paper "Straight to the Point: Fast-forwarding Videos via Reinforcement Learning Using Textual Data" at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020

Language: Python - Size: 27.4 MB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 8 - Forks: 1

scotthlee/enriched-LSTMs

Classifying multimodal health data with LSTMs

Language: Python - Size: 36.1 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 2

04mayukh/R2D2-at-SemEval-2022-Task-5-MAMI

This repository contains the code for submission made at SemEval 2022 Task 5: MAMI

Language: Jupyter Notebook - Size: 562 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

denizlab/MIMICCXR-MultiModal-SelfSupervision

Multi-Modal and Self-Supervised learning Benchmark for MIMIC-CXR

Language: Python - Size: 191 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

abs711/The-way-of-the-future

A dataset of egocentric vision, eye-tracking and full body kinematics from human locomotion in out-of-the-lab environments. Also, different use cases of the dataset along with example code.

Language: Python - Size: 48.3 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 0

sutdcv/multi-modal-video-reasoning

[ICCV2021 Workshop] Multi-Modal Video Reasoning and Analyzing Competition

Language: JavaScript - Size: 8.77 MB - Last synced at: 9 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 1

sk-aravind/3D-Bounding-Boxes-From-Monocular-Images

A two stage multi-modal loss model along with rigid body transformations to regress 3D bounding boxes

Language: Python - Size: 9.46 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 43 - Forks: 18

Netherlands-Cancer-Institute/Multimodal_attention_DeepLearning

Multi-modal deep learning with attention mechanism

Language: Python - Size: 2.39 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

asnelt/mmae

Package for Multimodal Autoencoders in TensorFlow / Keras

Language: Python - Size: 28.3 KB - Last synced at: 13 days ago - Pushed at: almost 5 years ago - Stars: 18 - Forks: 12

sverma88/DeepCU-IJCAI19

DeepCU: Integrating Both Common and Unique Latent Information for Multimodal Sentiment Analysis, IJCAI-19

Language: Python - Size: 36.7 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 19 - Forks: 8

SAIC-MONTREAL/multimodal-dynamics

Code for AAAI 2021 paper "Learning Intuitive Physics with Multimodal Generative Models"

Language: Python - Size: 192 KB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 11 - Forks: 2

cfcooney/BiModNeuroCNN

Package for bimodal training of deep neural networks on neurological data. Pypi: https://pypi.org/project/BiModNeuroCNN/

Language: Python - Size: 137 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 1

basiralab/MultiGraphGAN

MultiGraphGAN for predicting multiple target graphs from a source graph using geometric deep learning.

Language: Python - Size: 21.8 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 17 - Forks: 4

Fuzzytariy/CMF-DGCN

A Chinese Sentiment Analysis Model based on Transmembrane State Attention for Modal Fusion and Multimodal Dynamic Gradient Regulation.

Language: Python - Size: 4.04 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

nmagal/modality_drop_for_colearning

Repo containing code for Negative Co-learning to Positive Co-learning with Aggressive Modality Drop

Language: Jupyter Notebook - Size: 134 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

marialymperaiou/knowledge-enhanced-multimodal-learning

A list of research papers on knowledge-enhanced multimodal learning

Size: 20.5 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

RunyuFan/STNet

Code for JAG 2022 paper "Urban informal settlements classification via a transformer-based spatial-temporal fusion network using multimodal remote sensing and time-series human activity data"

Language: Python - Size: 71.3 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

04mayukh/Memebusters-at-SemEval-2020-Task-8-Memotion-Analysis

This repository contains the code for submission made at SemEval 2020: Task 8 Memotion analysis.

Language: Jupyter Notebook - Size: 55 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 0

marcomoldovan/cross-modal-speech-segment-retrieval

Learning a common representation space from speech and text for cross-modal retrieval given textual queries and speech files.

Language: Python - Size: 216 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 1

marcomoldovan/3d-attention-video-understanding

Using a 3D Nearby Self-Attention Transformer to leverage the spatiotemporal nature of video for representation learning.

Language: Python - Size: 33.2 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

A2Zadeh/Social-IQ

[CVPR 2019 Oral] Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence

Language: Python - Size: 2.71 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 41 - Forks: 5

JianqiangWan/VLPT-STD

Vision-Language Pre-Training for Boosting Scene Text Detectors (CVPR2022)

Size: 4.88 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 11 - Forks: 0

carlosholivan/AudioGenerationDiffusion

State-of-the-art of Audio Generation with Diffusion Models

Size: 179 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

Neerajj9/Stacked-Attention-Networks-for-Visual-Question-Answering

Implementation of the paper "Stacked Attention Networks for Image Question Answering" in Tensorflow

Language: Python - Size: 15.3 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 13 - Forks: 4

eslambakr/LAR-Look-Around-and-Refer

This is the official implementation for our paper;"LAR:Look Around and Refer".

Language: C++ - Size: 45 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 2

oskar-j/awesome-multimodal-ml

List of materials for the topic of multimodal models

Size: 3.91 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

ShowMeModel/transformers-multimodal-example

Example of a multimodal (end-to-end) deep learning model with transformers architecture

Size: 1.95 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

ikmb/PIA-inference

the peptide immune annotator pipeline (PIA-P) a collection of bash and Python scripts used running peptide HLA interaction from a variety of inputs

Language: Python - Size: 14.8 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

TIBHannover/multimodal-misogyny-detection-mami-2022

Multimodal Misogyny Detection - SemEval 2022 - MAMI Challenge

Language: Python - Size: 648 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 2

david-yoon/attentive-modality-hopping-for-SER

TensorFlow implementation of "Attentive Modality Hopping for Speech Emotion Recognition," ICASSP-20

Language: Python - Size: 53.7 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 27 - Forks: 8

SAGNIKMJR/move2hear-active-AV-separation

Code and datasets for 'Move2Hear: Active Audio-Visual Source Separation' (ICCV 2021)

Language: Python - Size: 1.31 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 0

ag2307/ConVIRT-Federated

Language: Jupyter Notebook - Size: 600 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

kritika-gupta/multi-modal-music-genre-classification

Final project for CS 7643 : Deep Learning (Fall 2022, Georgia Tech)

Language: Jupyter Notebook - Size: 20.7 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Damorgal/Multimodal-Research-experiments

All experiments were done to classify multimodal data.

Size: 161 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Nithin-Holla/meme_challenge

Repository containing code from team Kingsterdam for the Hateful Memes Challenge

Language: Python - Size: 1.36 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 17 - Forks: 8

eftekhar-hossain/Multimodal-Disaster_IEEE-Access

This repository contains the related resources of a multimodal deep learning project.

Language: Jupyter Notebook - Size: 4.05 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

l-yohai/Look-Attend-and-Generate-Poem Fork of boostcampaitech2/final-project-level3-nlp-08

AI Poet who looks at the images and writes poems Web service.

Language: Python - Size: 24.8 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

IsaacRodgz/Multimodal-Transformer

Multimodal version of transformer for classification using text and image

Language: Python - Size: 2.93 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 4 - Forks: 1

soloist97/region-hierarchical-pytorch

Implementation of a baseline method for image paragraph captioning

Language: Python - Size: 69.2 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 1

koushikvikram/multimodal-image-retrieval

📝🔍🖼️ A deep learning application for retrieving images by searching with text.

Language: Jupyter Notebook - Size: 382 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

koninik/multimodal_machine_translation

A PyTorch implementation of a Transformer Network for Machine Translation that incorporates image features to enhance the performance of the translation

Language: Python - Size: 59.1 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

prasoonvarshney/Multimodal-Transformer Fork of yaohungt/Multimodal-Transformer

Adding Bottlenecked Fusion to [ACL'19] Multimodal Transformer

Language: Python - Size: 351 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

El-Zag/Multimodal-Video-Captioning

Master Thesis on Multimodal Video Captioning, done at Huawei's Research Center in Amsterdam.

Language: Python - Size: 2.3 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

gorjanradevski/vsepp_tensorflow

Implementation of "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives" in Tensorflow.

Language: Python - Size: 49.8 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

GUT-AI/automated-data-preprocessing

Automated Data Preprocessing

Size: 48.8 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

xiaoxiaoheimei/SeqDialN

Code for reproducing results in our paper SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space.

Language: Python - Size: 76.2 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 1

IsaacRodgz/Multimodal-Adapters

Adapter modules with support for multimodal fusion of information (text, video, audio, etc.) using pre-trained BERT base model

Language: Jupyter Notebook - Size: 6.46 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

IsaacRodgz/multimodal-transformers-movies

Experiments with multimodal deep learning models based on transformers

Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 1

gtatiya/Deep-Multi-Sensory-Object-Categorization

Deep Multi-Sensory Object Category Recognition Using Interactive Behavioral Exploration

Language: Jupyter Notebook - Size: 2.65 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 15 - Forks: 8

celestialxevermore/CLIP2AE

AI-multimodal : Modeling the new text - video retrieval framework

Language: Jupyter Notebook - Size: 1.68 GB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 1

shbz80/fb_marketplace_reco

Facebook Marketplace is a platform for buying and selling products on Facebook. This project involves training a multimodal deep neural network model that predicts the category of a product based on its image and text description.

Language: Jupyter Notebook - Size: 4.17 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

gorjanradevski/cross_modal_full_transfer

PyTorch code for cross-modal-retrieval on Flickr8k/30k using Bert and EfficientNet

Language: Python - Size: 72.3 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 3 - Forks: 1

aidotse/multimodal-skin-lesion-classification

Mutlimodality for skin lesions classification

Language: Python - Size: 10.7 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 1

yubin1219/deep_learning_music

Deep Learning for Music & Audio - Multi modal project

Language: Jupyter Notebook - Size: 4.51 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

SRM-IST-KTR/disturbance-detection-in-messaging-apps-using-machine-learning-e5d7h9m7

A Fully Deployable React-Native mobile app that seeks to classify incoming messages in messaging apps into important or disturbing categories. using a Multi-Modal Machine Learning Architecture to achieve Text classification, Image classification and YouTube Video Link classification.

Language: Jupyter Notebook - Size: 40.4 MB - Last synced at: 8 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

candacelax/bias-in-vision-and-language

Code for paper "Measuring Social Biases in Grounded Vision and Language Embeddings"

Language: Shell - Size: 11.7 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

dh1105/Multi-modal-movie-genre-prediction

A multi-modal deep learning model trained to predict a movie's genre given the movie poster and overview as an input.

Language: Jupyter Notebook - Size: 362 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 12 - Forks: 10

library-of-code/deep-learning Fork of pclubiitk/model-zoo

Implementations of various Deep Learning models in PyTorch and TensorFlow.

Language: Jupyter Notebook - Size: 56.1 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

Related Keywords
multimodal-deep-learning 413 deep-learning 106 multimodal 86 pytorch 64 computer-vision 56 machine-learning 48 multimodal-learning 38 natural-language-processing 26 nlp 24 multimodality 22 vision-and-language 21 tensorflow 20 python 20 large-language-models 19 attention-mechanism 16 transformer 16 transformers 16 multimodal-sentiment-analysis 14 artificial-intelligence 14 llm 14 generative-ai 14 gpt4 13 classification 13 deep-neural-networks 13 multimodal-large-language-models 13 self-supervised-learning 13 emotion-recognition 12 visual-question-answering 11 convolutional-neural-networks 11 dataset 11 attention 10 multimodal-datasets 10 neural-network 10 ai 9 time-series 9 image-processing 9 language-model 9 clip 9 object-detection 8 image 8 awesome-list 8 vision-language-transformer 8 image-classification 8 vision-transformer 8 bert 8 sentiment-analysis 8 image-captioning 7 multimodal-fusion 7 multimodal-representation 7 vision-language 7 representation-learning 7 multimodal-data 7 diffusion-models 7 pytorch-lightning 7 cnn 7 vision-language-model 7 deeplearning 6 text-to-image 6 remote-sensing 6 graph-neural-networks 6 huggingface-transformers 6 neural-networks 6 foundation-models 6 3d 6 vision-language-pretraining 6 lstm 6 reinforcement-learning 6 keras 6 point-cloud 5 transfer-learning 5 recommender-system 5 embeddings 5 multimodal-interactions 5 contrastive-learning 5 memes 5 text 5 audio 5 audio-processing 5 visual-grounding 5 paper 5 variational-autoencoder 5 gan 5 transformer-models 5 generative-adversarial-network 5 image-generation 5 vqa 5 semantic-segmentation 5 python3 5 generative-model 5 anomaly-detection 5 nlp-machine-learning 5 attention-is-all-you-need 5 data-fusion 5 speech-recognition 5 large-multimodal-models 5 question-answering 5 cvpr 4 domain-adaptation 4 vision-and-language-pre-training 4 cross-modal-retrieval 4