multimodal-deep-learning | Topic

Topic: "multimodal-deep-learning"

choyingw/GAIS-Net

CVPR 2020 Workshop on Scalability in Autonomous Driving: GAIS-Net: Geometry-Aware Instance Segmentation with Disparity Maps

Language: Python - Size: 950 KB - Last synced at: 25 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

kyegomez/Gen2

Implementation of "Text driven video generation" in pytorch

Language: Python - Size: 222 KB - Last synced at: 15 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

ofa-x/OFA-X

This repository contains the code for the publication "Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations"

Language: Python - Size: 84.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 0

duyali2000/MQMC

This repo has the PyTorch implementation and datasets of our WSDM 2023 paper: “Multi-queue Momentum Contrast for Microvideo-Product Retrieval”.

Language: Python - Size: 1.97 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 1

ihaeyong/drama-graph

Drama-Graph repository produces both knowledge base on drama scripts and video graph for Video Turing Test (VTT).

Language: Jupyter Notebook - Size: 201 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

candacelax/bias-in-vision-and-language

Code for paper "Measuring Social Biases in Grounded Vision and Language Embeddings"

Language: Shell - Size: 11.7 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

verlab/StraightToThePoint_CVPR_2020

Original PyTorch implementation of the code for the paper "Straight to the Point: Fast-forwarding Videos via Reinforcement Learning Using Textual Data" at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020

Language: Python - Size: 27.4 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 8 - Forks: 1

soloist97/region-hierarchical-pytorch

Implementation of a baseline method for image paragraph captioning

Language: Python - Size: 69.2 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 1

talipucar/DomainAdaptation

A model for Domain Adaptation, Alignment and Translation using multiple sources of data.

Language: Python - Size: 46.5 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 1

AI4Patents/IMPACT

IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents (NeurIPS 2024)

Language: Jupyter Notebook - Size: 23.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 7 - Forks: 1

nngocson2002/ViVQA

The Multimodal Model for Vietnamese Visual Question Answering (ViVQA)

Language: Python - Size: 1.02 MB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 7 - Forks: 0

MichiganNLP/visual_diversity_budget

Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost

Size: 2.24 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 7 - Forks: 0

kyegomez/CELESTIAL-1

Omni-Modality Processing, Understanding, and Generation

Language: Python - Size: 2.49 MB - Last synced at: 15 days ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 1

JHKim-snu/GVCCI

[IROS 2023] GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation

Language: Python - Size: 27.4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 0

stevejpapad/image-text-verification

Official repository for the "VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias" paper.

Language: Python - Size: 11.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

justinbt1/Multimodal-Document-Classification

MSc project investigating multi-modal fusion approaches to combining textual and visual features for multi-page classification of documents within the OGA National Data Repository (NDR).

Language: Jupyter Notebook - Size: 11.9 MB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

kyegomez/MMCA-MGQA

Experiments around using Multi-Modal Casual Attention with Multi-Grouped Query Attention

Language: Python - Size: 210 KB - Last synced at: 15 days ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

davide-coccomini/Deepfake-Detection-Challenge-DFAD2023

Implementation of the winning solution for the Media Analytics Challenge 2023.

Language: Jupyter Notebook - Size: 41.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 0

JanTeichertKluge/DMLSim

This library provides packages on DoubleML / Causal Machine Learning and Neural Networks in Python for Simulation and Case Studies.

Language: Python - Size: 145 KB - Last synced at: 22 days ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 0

abs711/The-way-of-the-future

A dataset of egocentric vision, eye-tracking and full body kinematics from human locomotion in out-of-the-lab environments. Also, different use cases of the dataset along with example code.

Language: Python - Size: 48.3 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 0

efthymisgeo/multimodal-masking

This repo contains source code for the MultiModal Masking (M^3) Interspeech 2021 paper.

Language: Python - Size: 1.91 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 0

chikap421/videosam

This repository accompanies the paper "VideoSAM: A Large Vision Foundation Model for High-Speed Video Segmentation"

Language: Jupyter Notebook - Size: 160 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 5 - Forks: 1

kyegomez/Odin

SOTA Classification at scale for UAVs, Drones, and much more

Language: Python - Size: 211 KB - Last synced at: 21 days ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

Cominclip/RPF-Net

Official code for "Recurrent Progressive Fusion-based Learning for Multi-source Remote Sensing Image Classification"

Language: Python - Size: 319 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

saramaxyz/platform

Run custom multi-modal AI models fully on-device

Language: Swift - Size: 14.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

Netherlands-Cancer-Institute/Multimodal_attention_DeepLearning

Multi-modal deep learning with attention mechanism

Language: Python - Size: 2.39 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

gorjanradevski/vsepp_tensorflow

Implementation of "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives" in Tensorflow.

Language: Python - Size: 49.8 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

marialymperaiou/knowledge-enhanced-multimodal-learning

A list of research papers on knowledge-enhanced multimodal learning

Size: 20.5 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

bupt-mmai/S2TD

code for "S2TD: A Tree-Structured Decoder for Image Paragraph Captioning" accepted by MMAsia 2021

Language: Python - Size: 68.7 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 0

cfcooney/BiModNeuroCNN

Package for bimodal training of deep neural networks on neurological data. Pypi: https://pypi.org/project/BiModNeuroCNN/

Language: Python - Size: 137 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 1

xiaoxiaoheimei/SeqDialN

Code for reproducing results in our paper SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space.

Language: Python - Size: 76.2 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 1

gorjanradevski/SMHA

My master thesis: Siamese multi-hop attention for cross-modal retrieval.

Language: Python - Size: 2.76 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 0

nlp-unibo/multimodal-am-fallacy

Multimodal Fallacy Classification in Political Debates: Dataset and Experiments.

Language: Python - Size: 11.2 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

jahez07/Multimodal-Fusion-Strategy-to-Classify-Malware

This work focuses on proposing a novel approach towards classifying malware binaries by extracting visual features from malware executables.

Language: Jupyter Notebook - Size: 257 MB - Last synced at: 26 days ago - Pushed at: 9 months ago - Stars: 4 - Forks: 0

Eva-Kaushik/EMKGCN-MultiModal-Music-Recommender

The `MKGCN` class, coupled with the Spotify API, orchestrates a multi-modal knowledge graph convolutional network to enhance music recommendation systems by integrating user interaction data and diverse music modalities.

Language: Jupyter Notebook - Size: 11 MB - Last synced at: 29 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 1

AdrianBZG/SFAVEL

Code for "Unsupervised Pretraining for Fact Verification by Language Model Distillation" (ICLR 2024)

Language: Python - Size: 14.6 KB - Last synced at: 29 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 0

BorgwardtLab/DeepEST

Language: Python - Size: 396 KB - Last synced at: 20 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 0

marcomoldovan/multimodal-self-distillation

A generalized self-supervised training paradigm for unimodal and multimodal alignment and fusion.

Language: Python - Size: 526 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 2

IsaacRodgz/multimodal-transformers-movies

Experiments with multimodal deep learning models based on transformers

Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 1

Merterm/COSMic

Public repo for the paper: "COSMic: A Coherence-Aware Generation Metric for Image Descriptions" by Mert İnan, Piyush Sharma, Baber Khalid, Radu Soricut, Matthew Stone, Malihe Alikhani

Language: Python - Size: 396 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 0

04mayukh/Memebusters-at-SemEval-2020-Task-8-Memotion-Analysis

This repository contains the code for submission made at SemEval 2020: Task 8 Memotion analysis.

Language: Jupyter Notebook - Size: 55 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 0

IsaacRodgz/Multimodal-Transformer

Multimodal version of transformer for classification using text and image

Language: Python - Size: 2.93 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 4 - Forks: 1

iamdanialkamali/MemotionAnalysis

Meme Sentiment Analysis SemEval 2020 Task 9

Language: Jupyter Notebook - Size: 1.93 MB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 0

GerrySant/multimodalhugs

MultimodalHugs is an extension of Hugging Face that offers a generalized framework for training, evaluating, and using multimodal AI models with minimal code differences, ensuring seamless compatibility with Hugging Face pipelines.

Language: Python - Size: 4.24 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3 - Forks: 2

stevejpapad/relevant-evidence-detection

Official repository for the "RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection" paper.

Language: Python - Size: 40.7 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 3 - Forks: 2

icedpanda/COMPASS-official

Official Implementation of Unveiling User Preferences: A Knowledge Graph and LLM-Driven Approach for Conversational Recommendation

Language: Python - Size: 82.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

Davidlequnchen/Awesome-AM-process-monitoring-control

A curated collection of research papers with open-source implementations/datasets focused on in-situ process monitoring and adaptive control in laser-based additive manufacturing.

Size: 57.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

Davidlequnchen/MultiSensorFusion-ROS-AM-Monitoring

ROS-based Multisensor Fusion Digital Twin (MFDT) platform for real-time monitoring and defect detection of Laser-Directed Energy Deposition (L-DED) Additive Manufacturing (AM) process.

Language: HTML - Size: 3.8 GB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 1

fork123aniket/Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot

Streamlit App Combining Vision, Language, and Audio AI Models

Language: Python - Size: 18.6 KB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

hubtru/Minape

Multimodal Isotropic Neural Architecture with Patch Embedding to both time series and image data for classification purposes.

Language: Jupyter Notebook - Size: 47 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 3 - Forks: 1

deepmancer/deepmancer

"When in doubt, use brute force." - Ken Thompson

Size: 429 KB - Last synced at: 7 months ago - Pushed at: 8 months ago - Stars: 3 - Forks: 0

RunyuFan/FusionMixer-TGRS-2022

Code for TGRS 2022 paper "Multilevel Spatial-Channel Feature Fusion Network for Urban Village Classification by Fusing Satellite and Streetview Images"

Language: Python - Size: 51.8 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 3 - Forks: 2

ibnaleem/mikael

a Discord chatbot trained on Mistral and LLaVA language models

Language: Python - Size: 3.53 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 1

gustavocidornelas/fused-multimodal-emotion

Multimodal emotion recognition using lexico-acoustic language descriptions

Language: Python - Size: 37.3 MB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 1

samyuh/seadronessee-metadata-adaptation

Exploring Metadata in Neural Networks for UAV Maritime Surveillance

Language: Jupyter Notebook - Size: 34.5 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

Fuzzytariy/CMF-DGCN

A Chinese Sentiment Analysis Model based on Transmembrane State Attention for Modal Fusion and Multimodal Dynamic Gradient Regulation.

Language: Python - Size: 4.04 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

liuzwin98/DSCMT

code released

Language: Python - Size: 64.5 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

carlosholivan/AudioGenerationDiffusion

State-of-the-art of Audio Generation with Diffusion Models

Size: 179 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

danadascalescu00/MultimodalOpinionAnalysis 📦

Bachelor Thesis: Opinion Polarity Classification - Given a tweet consisting of an image and text, classify the post on three-point scale

Language: Python - Size: 101 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

El-Zag/Multimodal-Video-Captioning

Master Thesis on Multimodal Video Captioning, done at Huawei's Research Center in Amsterdam.

Language: Python - Size: 2.3 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

celestialxevermore/CLIP2AE

AI-multimodal : Modeling the new text - video retrieval framework

Language: Jupyter Notebook - Size: 1.68 GB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 1

comp-well-org/More2Less

More to Less (M2L): Enhanced Health Recognition in the Wild with Reduced Modality of Wearable Sensors

Language: Python - Size: 1.03 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

koushikvikram/multimodal-image-retrieval

📝🔍🖼️ A deep learning application for retrieving images by searching with text.

Language: Jupyter Notebook - Size: 382 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

mazleon/Hateful_Meme_Challenge

Hateful Memes dataset contains real hate speech. The Real Hateful Memes dataset consists of more than 10,000 newly created examples by Facebook AI.

Language: Jupyter Notebook - Size: 1.38 MB - Last synced at: 1 day ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 0

licesonw/deepmm

Multimodal deep learning package that uses both categorical and text-based features in a single deep architecture for regression and binary classification use cases.

Language: Python - Size: 385 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 0

IsaacRodgz/GMU-Baseline

Replication of models and results obtained in "Gated multimodal networks" paper

Language: Python - Size: 53.4 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 0

gorjanradevski/cross_modal_full_transfer

PyTorch code for cross-modal-retrieval on Flickr8k/30k using Bert and EfficientNet

Language: Python - Size: 72.3 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 3 - Forks: 1

Demfier/pmup

App to cheer you up with some awesome quotes when depressed using deep learning

Language: Python - Size: 17.5 MB - Last synced at: 3 days ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 0

mbaqer/V2X-mmWave-Beamforming

PyTorch implementation of multi-modality sensing in 60 GHz mmWave beamforming for connected vehicles.

Language: Jupyter Notebook - Size: 5.09 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 2 - Forks: 0

brian-cy-chang/Multimodal_VB-Fracture-Detector

An easy-to-use framework for multimodal models to detect vertebral body fractures in PyTorch

Language: Python - Size: 1.82 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 2 - Forks: 0

icon-lab/MedTrim

Official implementation of "Meta-Entity Driven Triplet Mining for Aligning Medical Vision-Language Models"

Language: Python - Size: 40 KB - Last synced at: 20 days ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 1

fork123aniket/Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling

Language: Python - Size: 94.7 KB - Last synced at: 28 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

FIVEYOUNGWOO/WiFiMobNet

WiFi-Camera multimodal learning-based object detection and pose estimation.

Language: Python - Size: 560 MB - Last synced at: 2 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 1

Adm-2005/DeMorph

Deepfake Detection Solution using Multimodal Approach.

Language: Python - Size: 10.9 MB - Last synced at: 26 days ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

hubtru/Impala

Expandable Isotropic Multimodal Patch Learning Neural Architecture for the Nano-modal (9) time-series and images data.

Language: Jupyter Notebook - Size: 1.08 GB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

DongmingShenDS/Multi-Modal-ML-Project

A data science project to predict online pet adoption speed using image, natural language, and tabular data with a multi-modal ML framework.

Language: Jupyter Notebook - Size: 17.4 MB - Last synced at: 11 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 0

AndreiMoraru123/ContextCollector

Mixed vision-language Attention Model that gets better by making mistakes

Language: Python - Size: 149 MB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

YuxingLu613/HTML

Code for paper Multiomics dynamic learning enables personalized diagnosis and prognosis for pan-cancer and cancer-subtypes

Language: Python - Size: 43.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

elsobhano/Multimodal-Emotion-Recognition

Multimodal Emotion Recognition using ClipBERT.

Language: Python - Size: 880 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

TIBHannover/MM_Claims

Official code repository for the paper: Gullal Singh Cheema, Sherzod Hakimov, Abdul Sittar, Eric Müller-Budack, Christian Otto, and Ralph Ewerth. 2022. “MM-Claims: A Dataset for Multimodal Claim Detection in Social Media.“ In Findings of the Association for Computational Linguistics: NAACL 2022, pages 962–979, Seattle, United States.

Language: Python - Size: 42.3 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 2