GitHub topics: multimodal-deep-learning

Repositories

kyegomez/Gen2

Implementation of "Text driven video generation" in pytorch

Language: Python - Size: 222 KB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

kyegomez/CELESTIAL-1

Omni-Modality Processing, Understanding, and Generation

Language: Python - Size: 2.49 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 1

ramakrishnan2503/LearnMate-2.0

Personalized learning companion (Updated version of LearnMate).

Language: Jupyter Notebook - Size: 146 KB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

DistilledCode/mmrl

Multi-Modal Representational Learning for Social Media Popularity Prediction

Language: Python - Size: 27.3 KB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

slinusc/path-vqa-blip

Fine-tuning BLIP for pathological visual question answering.

Language: Jupyter Notebook - Size: 80.1 KB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 1

bairdxiong/SegResearchToolkit

A High-Efficient Research Development Toolkit for Image Segmentation Based on Pytorch.

Language: Python - Size: 3.11 MB - Last synced at: 5 months ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 0

kaledhoshme123/Multimodal-face-generation-facial-biometrics-

Similarity between faces: One person resembles another person to a large degree. This can lead to many problems facing security surveillance systems. Facial recognition systems have difficulty distinguishing between the main person and other people who are highly similar in terms of features.

Language: Jupyter Notebook - Size: 16.8 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

a-tabaza/fairouz_demo

Demo for Binding Text, Images, Graphs, and Audio for Music Representation Learning

Language: Python - Size: 27.8 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

XavierSpycy/MultiCLIP

MultiCLIP: A framework for multimodal-multilabel-multistage classification utilizing advanced pretrained models like CLIP and BLIP. 一个多模态多标签多阶段分类框架，利用像CLIP和BLIP这样的先进预训练模型。

Language: Python - Size: 2.2 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

eftekhar-hossain/Bengali-Hateful-Memes

[ACL, EACL'24] Multimodal Hate Speech Detection in Bengali

Language: Python - Size: 2.68 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 1

VisualWebBench/VisualWebBench

Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"

Language: Python - Size: 3.17 MB - Last synced at: 11 months ago - Pushed at: 12 months ago - Stars: 36 - Forks: 1

darmangerd/vubot

Multimodal Computer Vision application leveraging object detections, gesture recognition and speech to text, in order to help user ask questions about their environment.

Language: Python - Size: 63.5 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

DunnBC22/Vision_Audio_and_Multimodal_Projects

This repository includes all computer vision, audio, document AI, and multimodal projects.

Language: Jupyter Notebook - Size: 108 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 28 - Forks: 5

darrylnurse/viewvie

Movie detection application.

Language: JavaScript - Size: 123 MB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 1

Shen-Lab/CPAC

[Bioinformatics 2022] Cross-Modality and Self-Supervised Protein Embedding for Compound-Protein Affinity and Contact Prediction

Language: Python - Size: 134 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 12 - Forks: 1

DongmingShenDS/Multi-Modal-ML-Project

A data science project to predict online pet adoption speed using image, natural language, and tabular data with a multi-modal ML framework.

Language: Jupyter Notebook - Size: 17.4 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 0

MichiganNLP/visual_diversity_budget

Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost

Size: 2.24 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 7 - Forks: 0

ViLab-UCSD/LaGTran_ICML2024

Code and models for the ICML 2024 paper "Tell, Don`t Show!: Language Guidance Eases Transfer Across Domains in Images and Videos"

Language: Python - Size: 151 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

HySonLab/Ligand_Generation

Target-aware Variational Auto-encoders for Ligand Generation with Multimodal Protein Representation Learning

Language: Python - Size: 257 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 23 - Forks: 2

cosmaadrian/multimodal-depression-from-video

Official source code for the paper: "Reading Between the Frames Multi-Modal Non-Verbal Depression Detection in Videos"

Language: Python - Size: 370 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 20 - Forks: 2

florencejt/fusilli

A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸

Language: Python - Size: 987 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 146 - Forks: 12

MILVLG/prophet

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

Language: Python - Size: 1.09 MB - Last synced at: 12 months ago - Pushed at: almost 2 years ago - Stars: 259 - Forks: 27

usc-sail/mica-context-emotion-recognition

Repository for context based emotion recognition

Language: Python - Size: 45.9 KB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

rajlm10/Shoppster

Multimodal Shopping Assistant

Language: Jupyter Notebook - Size: 10 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

sisinflab/Formal-MultiMod-Rec

Formalizing Multimedia Recommendation through Multimodal Deep Learning, accepted in ACM Transactions on Recommender Systems.

Language: Python - Size: 903 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 10 - Forks: 1

N-G-Asker/TasteRank

TasteRank: Personalized Image Search and Recommendation. This research project proposes an AI-based method for scoring photos on relevance to user interests. TasteRank leverages language and vision models, including Mistral LLMs and OpenAI’s CLIP, and applies multimodal machine-learning techniques.

Language: Jupyter Notebook - Size: 3.26 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Sreyan88/MMER

Code for the InterSpeech 2023 paper: MMER: Multimodal Multi-task learning for Speech Emotion Recognition

Language: Python - Size: 1.59 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 54 - Forks: 14

GeorgeTouros/video-soundtrack-evaluation

Create a large, well-managed and clean data-set for the task of music composition for video soundtracks.

Language: Jupyter Notebook - Size: 32.2 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

parham/lemanchot-analysis

LeManchot-Analysis is a system for abnormal detection in coupled visible-thermal images

Language: Python - Size: 79.9 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 2

choyingw/GAIS-Net

CVPR 2020 Workshop on Scalability in Autonomous Driving: GAIS-Net: Geometry-Aware Instance Segmentation with Disparity Maps

Language: Python - Size: 950 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

affjljoo3581/Job-Recommend-Competition

🥇KNOW기반 직업 추천 알고리즘 경진대회 1등 솔루션입니다🥇

Language: Python - Size: 1.74 MB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 43 - Forks: 4

JHKim-snu/GVCCI

[IROS 2023] GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation

Language: Python - Size: 27.4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 0

Anas1108/Multimodal_Memes_Classification

Build a PyTorch-based multimodal architecture to classify memes using image & caption. Trained on a meme classification dataset, MLP architecture uses PyTorch, Numpy, Matplotlib, & Sklearn to achieve improved performance compared to baselines.

Language: Jupyter Notebook - Size: 851 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

yiren-jian/BLIText

[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

Language: Python - Size: 34.4 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 17 - Forks: 1

usc-sail/mica-deep-mcca

Deep Multiset Canonical Correlation Analysis - An extension of CCA to multiple datasets

Language: Python - Size: 103 MB - Last synced at: 12 months ago - Pushed at: over 5 years ago - Stars: 31 - Forks: 14

Agora-X/DailyPaperClub

The repository for the exclusive Daily Paper Club hosted at Agora every 10pm NYC time at this discord: https://discord.gg/Gnzh6dnzyz

Size: 14.6 KB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 0

nyukat/greedy_multimodal_learning

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

Language: Python - Size: 47.9 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 24 - Forks: 2

emerisly/EDIS

Entity-Driven Image Search over Multimodal Web Content (EMNLP 2023)

Language: Python - Size: 1.61 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 27 - Forks: 0

YeonwooSung/LIMoE-pytorch

PyTorch implementation of LIMoE

Language: Python - Size: 4.3 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 43 - Forks: 1

Nithin-GK/UniteandConquer

[CVPR '23] Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

Language: Python - Size: 6.55 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 28 - Forks: 3

david-yoon/multimodal-speech-emotion

TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18

Language: Jupyter Notebook - Size: 238 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 239 - Forks: 70

AdrianBZG/SFAVEL

Code for "Unsupervised Pretraining for Fact Verification by Language Model Distillation" (ICLR 2024)

Language: Python - Size: 14.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 0

sisinflab/LoG-2023-GNNs-RecSys

Presented as tutorial at the Second Learning on Graphs Conference (LoG 2023)

Language: Jupyter Notebook - Size: 12.9 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 0

claws-lab/multimodal-robustness

Code and resources for EMNLP 2022 paper on 'Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions'

Language: Python - Size: 71.6 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 9 - Forks: 1

isevr/TVEmotion

A novel multimodal approach for emotion recognition deploying early fusion based on graph-captured embeddings

Language: Jupyter Notebook - Size: 164 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

imatge-upc/wav2pix Fork of miqueltubau/Wav2Pix

Speech-conditioned face generation using Generative Adversarial Networks (ICASSP 2019)

Language: Python - Size: 202 MB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 55 - Forks: 24

nesl/Robust-Deep-Learning-Pipeline

Deep Convolutional Bidirectional LSTM for Complex Activity Recognition with Missing Data. Human Activity Recognition Challenge. Springer SIST (2020)

Language: Jupyter Notebook - Size: 876 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 22 - Forks: 3

Cominclip/RPF-Net

Official code for "Recurrent Progressive Fusion-based Learning for Multi-source Remote Sensing Image Classification"

Language: Python - Size: 319 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

SRDdev/OpenAI-CLIP

Simple Educational Implementation of OpenAI CLIP in PyTorch

Language: Jupyter Notebook - Size: 6.63 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

lmb-freiburg/Multimodal-Future-Prediction

The official repository for the CVPR 2019 paper "Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction"

Language: Python - Size: 21.6 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 47 - Forks: 8

eliottcrancee/ParoleNet

Utilizing a multimodal architecture to predict the appropriate speaker turn in a dialogue.

Language: Python - Size: 178 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ishitab1310/HateFilter

Analyzing Hateful Memes/ (Resources:- Hateful Memes Challenge)

Language: Jupyter Notebook - Size: 3.09 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

thatAverageGuy/EarlyFusion-on-EasyVQA

Streamlit app for demonstrating multi-modal(vision+language) modelling in Pytorch.

Language: Python - Size: 2.74 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

georgepar/slp

Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning

Language: Python - Size: 2.02 MB - Last synced at: 6 months ago - Pushed at: about 2 years ago - Stars: 21 - Forks: 7

davide-coccomini/Deepfake-Detection-Challenge-DFAD2023

Implementation of the winning solution for the Media Analytics Challenge 2023.

Language: Jupyter Notebook - Size: 41.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 0

AnkurDeria/MFT

Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.

Language: Jupyter Notebook - Size: 2.13 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 130 - Forks: 8

association-rosia/flair-2

Engage in a semantic segmentation challenge for land cover description using multimodal remote sensing earth observation data, delving into real-world scenarios with a dataset comprising 70,000+ aerial imagery patches and 50,000 Sentinel-2 satellite acquisitions.

Language: Jupyter Notebook - Size: 44.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

jena-shreyas/Awesome-Video-Language-Resources

A repository of Video Language papers, code and datasets.

Size: 7.81 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ihaeyong/drama-graph

Drama-Graph repository produces both knowledge base on drama scripts and video graph for Video Turing Test (VTT).

Language: Jupyter Notebook - Size: 201 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

DavidHuji/CapDec

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

Language: Python - Size: 35.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 158 - Forks: 17

orrzohar/LOVM

[NeurIPS 2023] Official Pytorch code for LOVM: Language-Only Vision Model Selection

Language: Python - Size: 4.44 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 15 - Forks: 0

nicolafan/neural-artwork-caption-generator

Code for the paper "Exploring the Synergy Between Vision-Language Pretraining and ChatGPT for Artwork Captioning: A Preliminary Study"

Language: Jupyter Notebook - Size: 130 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

arusl/tmdb-multimodal-inference

This repo contains a Jupyter notebook showing how to run a prediction of new data using a multimodal deep learning model to predict movie genres.

Language: Jupyter Notebook - Size: 1.73 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 1

tomoyoshki/focal

Pytorch Implementation of FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space

Language: Python - Size: 59.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

sarthak268/c3vqg-official

PyTorch Implementation for the paper "C3VQG: Category Consistent Cyclic Visual Question Generation" (ACM MM Asia'20).

Language: Python - Size: 63.9 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 6

aimotive/aimotive-dataset-loader

Dataset loader and renderer for aiMotive Multimodal Dataset

Language: Python - Size: 614 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 10 - Forks: 2

deeplsd/Syncnet_Analysis

This code is part of the paper: "A Deep Dive Into Neural Synchrony Evaluation for Audio-visual Translation" published at ACM ICMI 2022.

Language: Python - Size: 57.9 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

PrithivirajDamodaran/vision-language-modelling-series

Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations

Language: Jupyter Notebook - Size: 6.15 MB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 4

shubhamagarwal92/mmd

This repository contains the Pytorch implementation for our SCAI (EMNLP-2018) submission "A Knowledge-Grounded Multimodal Search-Based Conversational Agent"

Language: Python - Size: 82 KB - Last synced at: 11 months ago - Pushed at: almost 5 years ago - Stars: 29 - Forks: 5

gustavocidornelas/fused-multimodal-emotion

Multimodal emotion recognition using lexico-acoustic language descriptions

Language: Python - Size: 37.3 MB - Last synced at: 20 days ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 1

talipucar/talipucar.github.io_old

Showcases ongoing, and completed projects within various research themes.

Size: 8.51 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

diegovalsesia/XMFnet

Code for "Cross-modal Learning for Image-Guided Point Cloud Shape Completion" (NeurIPS 2022)

Language: Python - Size: 22.2 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 34 - Forks: 6

abs711/Visual-Control

Deep Learning models to fuse imu-based motion capture and first-person video data to improve the prediction of future knee and ankle joint kinematics, in complex real-world environments.

Language: Python - Size: 75.1 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

stevejpapad/image-text-verification

Official repository for the "VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias" paper.

Language: Python - Size: 11.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

giganttheo/tib-dataset

Dataset for abstractive summarization of long multimodal presentations

Size: 1.95 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

vklinhhh/Video-Event-Retrieval

The Video Event Retrieval Project for Vietnamese News facilitates the precise extraction of events from video archives through content analysis and indexing of Vietnamese news videos.

Language: Python - Size: 26.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

XavierSpycy/CAT-ImageTextIntegrator

An innovative deep learning framework leveraging the CAT (Convolutions, Attention & Transformers) architecture to seamlessly integrate visual and textual modalities. This model exploits the prowess of CNNs for image feature extraction and Transformers for intricate textual pattern recognition, setting a new paradigm in multimodal learning.

Language: Python - Size: 8.21 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0