An open API service providing repository metadata for many open source software ecosystems.

Topic: "multimodal-deep-learning"

choyingw/GAIS-Net

CVPR 2020 Workshop on Scalability in Autonomous Driving: GAIS-Net: Geometry-Aware Instance Segmentation with Disparity Maps

Language: Python - Size: 950 KB - Last synced at: 25 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

kyegomez/Gen2

Implementation of "Text driven video generation" in pytorch

Language: Python - Size: 222 KB - Last synced at: 15 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

ofa-x/OFA-X

This repository contains the code for the publication "Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations"

Language: Python - Size: 84.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 0

duyali2000/MQMC

This repo has the PyTorch implementation and datasets of our WSDM 2023 paper: “Multi-queue Momentum Contrast for Microvideo-Product Retrieval”.

Language: Python - Size: 1.97 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 1

ihaeyong/drama-graph

Drama-Graph repository produces both knowledge base on drama scripts and video graph for Video Turing Test (VTT).

Language: Jupyter Notebook - Size: 201 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

candacelax/bias-in-vision-and-language

Code for paper "Measuring Social Biases in Grounded Vision and Language Embeddings"

Language: Shell - Size: 11.7 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

verlab/StraightToThePoint_CVPR_2020

Original PyTorch implementation of the code for the paper "Straight to the Point: Fast-forwarding Videos via Reinforcement Learning Using Textual Data" at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020

Language: Python - Size: 27.4 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 8 - Forks: 1

soloist97/region-hierarchical-pytorch

Implementation of a baseline method for image paragraph captioning

Language: Python - Size: 69.2 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 1

talipucar/DomainAdaptation

A model for Domain Adaptation, Alignment and Translation using multiple sources of data.

Language: Python - Size: 46.5 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 1

AI4Patents/IMPACT

IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents (NeurIPS 2024)

Language: Jupyter Notebook - Size: 23.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 7 - Forks: 1

nngocson2002/ViVQA

The Multimodal Model for Vietnamese Visual Question Answering (ViVQA)

Language: Python - Size: 1.02 MB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 7 - Forks: 0

MichiganNLP/visual_diversity_budget

Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost

Size: 2.24 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 7 - Forks: 0

kyegomez/CELESTIAL-1

Omni-Modality Processing, Understanding, and Generation

Language: Python - Size: 2.49 MB - Last synced at: 15 days ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 1

JHKim-snu/GVCCI

[IROS 2023] GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation

Language: Python - Size: 27.4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 0

stevejpapad/image-text-verification

Official repository for the "VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias" paper.

Language: Python - Size: 11.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

justinbt1/Multimodal-Document-Classification

MSc project investigating multi-modal fusion approaches to combining textual and visual features for multi-page classification of documents within the OGA National Data Repository (NDR).

Language: Jupyter Notebook - Size: 11.9 MB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

kyegomez/MMCA-MGQA

Experiments around using Multi-Modal Casual Attention with Multi-Grouped Query Attention

Language: Python - Size: 210 KB - Last synced at: 15 days ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

davide-coccomini/Deepfake-Detection-Challenge-DFAD2023

Implementation of the winning solution for the Media Analytics Challenge 2023.

Language: Jupyter Notebook - Size: 41.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 0

JanTeichertKluge/DMLSim

This library provides packages on DoubleML / Causal Machine Learning and Neural Networks in Python for Simulation and Case Studies.

Language: Python - Size: 145 KB - Last synced at: 22 days ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 0

abs711/The-way-of-the-future

A dataset of egocentric vision, eye-tracking and full body kinematics from human locomotion in out-of-the-lab environments. Also, different use cases of the dataset along with example code.

Language: Python - Size: 48.3 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 0

efthymisgeo/multimodal-masking

This repo contains source code for the MultiModal Masking (M^3) Interspeech 2021 paper.

Language: Python - Size: 1.91 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 0

chikap421/videosam

This repository accompanies the paper "VideoSAM: A Large Vision Foundation Model for High-Speed Video Segmentation"

Language: Jupyter Notebook - Size: 160 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 5 - Forks: 1

kyegomez/Odin

SOTA Classification at scale for UAVs, Drones, and much more

Language: Python - Size: 211 KB - Last synced at: 21 days ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

Cominclip/RPF-Net

Official code for "Recurrent Progressive Fusion-based Learning for Multi-source Remote Sensing Image Classification"

Language: Python - Size: 319 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

saramaxyz/platform

Run custom multi-modal AI models fully on-device

Language: Swift - Size: 14.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

Netherlands-Cancer-Institute/Multimodal_attention_DeepLearning

Multi-modal deep learning with attention mechanism

Language: Python - Size: 2.39 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

gorjanradevski/vsepp_tensorflow

Implementation of "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives" in Tensorflow.

Language: Python - Size: 49.8 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

marialymperaiou/knowledge-enhanced-multimodal-learning

A list of research papers on knowledge-enhanced multimodal learning

Size: 20.5 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

bupt-mmai/S2TD

code for "S2TD: A Tree-Structured Decoder for Image Paragraph Captioning" accepted by MMAsia 2021

Language: Python - Size: 68.7 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 0

cfcooney/BiModNeuroCNN

Package for bimodal training of deep neural networks on neurological data. Pypi: https://pypi.org/project/BiModNeuroCNN/

Language: Python - Size: 137 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 1

xiaoxiaoheimei/SeqDialN

Code for reproducing results in our paper SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space.

Language: Python - Size: 76.2 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 1

gorjanradevski/SMHA

My master thesis: Siamese multi-hop attention for cross-modal retrieval.

Language: Python - Size: 2.76 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 0

nlp-unibo/multimodal-am-fallacy

Multimodal Fallacy Classification in Political Debates: Dataset and Experiments.

Language: Python - Size: 11.2 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

jahez07/Multimodal-Fusion-Strategy-to-Classify-Malware

This work focuses on proposing a novel approach towards classifying malware binaries by extracting visual features from malware executables.

Language: Jupyter Notebook - Size: 257 MB - Last synced at: 26 days ago - Pushed at: 9 months ago - Stars: 4 - Forks: 0

Eva-Kaushik/EMKGCN-MultiModal-Music-Recommender

The `MKGCN` class, coupled with the Spotify API, orchestrates a multi-modal knowledge graph convolutional network to enhance music recommendation systems by integrating user interaction data and diverse music modalities.

Language: Jupyter Notebook - Size: 11 MB - Last synced at: 29 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 1

AdrianBZG/SFAVEL

Code for "Unsupervised Pretraining for Fact Verification by Language Model Distillation" (ICLR 2024)

Language: Python - Size: 14.6 KB - Last synced at: 29 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 0

BorgwardtLab/DeepEST

Language: Python - Size: 396 KB - Last synced at: 20 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 0

marcomoldovan/multimodal-self-distillation

A generalized self-supervised training paradigm for unimodal and multimodal alignment and fusion.

Language: Python - Size: 526 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 2

IsaacRodgz/multimodal-transformers-movies

Experiments with multimodal deep learning models based on transformers

Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 1

Merterm/COSMic

Public repo for the paper: "COSMic: A Coherence-Aware Generation Metric for Image Descriptions" by Mert İnan, Piyush Sharma, Baber Khalid, Radu Soricut, Matthew Stone, Malihe Alikhani

Language: Python - Size: 396 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 0

04mayukh/Memebusters-at-SemEval-2020-Task-8-Memotion-Analysis

This repository contains the code for submission made at SemEval 2020: Task 8 Memotion analysis.

Language: Jupyter Notebook - Size: 55 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 0

IsaacRodgz/Multimodal-Transformer

Multimodal version of transformer for classification using text and image

Language: Python - Size: 2.93 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 4 - Forks: 1

iamdanialkamali/MemotionAnalysis

Meme Sentiment Analysis SemEval 2020 Task 9

Language: Jupyter Notebook - Size: 1.93 MB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 0

GerrySant/multimodalhugs

MultimodalHugs is an extension of Hugging Face that offers a generalized framework for training, evaluating, and using multimodal AI models with minimal code differences, ensuring seamless compatibility with Hugging Face pipelines.

Language: Python - Size: 4.24 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3 - Forks: 2

stevejpapad/relevant-evidence-detection

Official repository for the "RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection" paper.

Language: Python - Size: 40.7 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 3 - Forks: 2

icedpanda/COMPASS-official

Official Implementation of Unveiling User Preferences: A Knowledge Graph and LLM-Driven Approach for Conversational Recommendation

Language: Python - Size: 82.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

Davidlequnchen/Awesome-AM-process-monitoring-control

A curated collection of research papers with open-source implementations/datasets focused on in-situ process monitoring and adaptive control in laser-based additive manufacturing.

Size: 57.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

Davidlequnchen/MultiSensorFusion-ROS-AM-Monitoring

ROS-based Multisensor Fusion Digital Twin (MFDT) platform for real-time monitoring and defect detection of Laser-Directed Energy Deposition (L-DED) Additive Manufacturing (AM) process.

Language: HTML - Size: 3.8 GB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 1

fork123aniket/Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot

Streamlit App Combining Vision, Language, and Audio AI Models

Language: Python - Size: 18.6 KB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

hubtru/Minape

Multimodal Isotropic Neural Architecture with Patch Embedding to both time series and image data for classification purposes.

Language: Jupyter Notebook - Size: 47 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 3 - Forks: 1

deepmancer/deepmancer

"When in doubt, use brute force." - Ken Thompson

Size: 429 KB - Last synced at: 7 months ago - Pushed at: 8 months ago - Stars: 3 - Forks: 0

RunyuFan/FusionMixer-TGRS-2022

Code for TGRS 2022 paper "Multilevel Spatial-Channel Feature Fusion Network for Urban Village Classification by Fusing Satellite and Streetview Images"

Language: Python - Size: 51.8 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 3 - Forks: 2

ibnaleem/mikael

a Discord chatbot trained on Mistral and LLaVA language models

Language: Python - Size: 3.53 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 1

gustavocidornelas/fused-multimodal-emotion

Multimodal emotion recognition using lexico-acoustic language descriptions

Language: Python - Size: 37.3 MB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 1

samyuh/seadronessee-metadata-adaptation

Exploring Metadata in Neural Networks for UAV Maritime Surveillance

Language: Jupyter Notebook - Size: 34.5 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

Fuzzytariy/CMF-DGCN

A Chinese Sentiment Analysis Model based on Transmembrane State Attention for Modal Fusion and Multimodal Dynamic Gradient Regulation.

Language: Python - Size: 4.04 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

liuzwin98/DSCMT

code released

Language: Python - Size: 64.5 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

carlosholivan/AudioGenerationDiffusion

State-of-the-art of Audio Generation with Diffusion Models

Size: 179 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

danadascalescu00/MultimodalOpinionAnalysis 📦

Bachelor Thesis: Opinion Polarity Classification - Given a tweet consisting of an image and text, classify the post on three-point scale

Language: Python - Size: 101 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

El-Zag/Multimodal-Video-Captioning

Master Thesis on Multimodal Video Captioning, done at Huawei's Research Center in Amsterdam.

Language: Python - Size: 2.3 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

celestialxevermore/CLIP2AE

AI-multimodal : Modeling the new text - video retrieval framework

Language: Jupyter Notebook - Size: 1.68 GB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 1

comp-well-org/More2Less

More to Less (M2L): Enhanced Health Recognition in the Wild with Reduced Modality of Wearable Sensors

Language: Python - Size: 1.03 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

koushikvikram/multimodal-image-retrieval

📝🔍🖼️ A deep learning application for retrieving images by searching with text.

Language: Jupyter Notebook - Size: 382 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

mazleon/Hateful_Meme_Challenge

Hateful Memes dataset contains real hate speech. The Real Hateful Memes dataset consists of more than 10,000 newly created examples by Facebook AI.

Language: Jupyter Notebook - Size: 1.38 MB - Last synced at: 1 day ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 0

licesonw/deepmm

Multimodal deep learning package that uses both categorical and text-based features in a single deep architecture for regression and binary classification use cases.

Language: Python - Size: 385 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 0

IsaacRodgz/GMU-Baseline

Replication of models and results obtained in "Gated multimodal networks" paper

Language: Python - Size: 53.4 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 0

gorjanradevski/cross_modal_full_transfer

PyTorch code for cross-modal-retrieval on Flickr8k/30k using Bert and EfficientNet

Language: Python - Size: 72.3 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 3 - Forks: 1

Demfier/pmup

App to cheer you up with some awesome quotes when depressed using deep learning

Language: Python - Size: 17.5 MB - Last synced at: 3 days ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 0

mbaqer/V2X-mmWave-Beamforming

PyTorch implementation of multi-modality sensing in 60 GHz mmWave beamforming for connected vehicles.

Language: Jupyter Notebook - Size: 5.09 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 2 - Forks: 0

brian-cy-chang/Multimodal_VB-Fracture-Detector

An easy-to-use framework for multimodal models to detect vertebral body fractures in PyTorch

Language: Python - Size: 1.82 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 2 - Forks: 0

icon-lab/MedTrim

Official implementation of "Meta-Entity Driven Triplet Mining for Aligning Medical Vision-Language Models"

Language: Python - Size: 40 KB - Last synced at: 20 days ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 1

fork123aniket/Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling

Language: Python - Size: 94.7 KB - Last synced at: 28 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

FIVEYOUNGWOO/WiFiMobNet

WiFi-Camera multimodal learning-based object detection and pose estimation.

Language: Python - Size: 560 MB - Last synced at: 2 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 1

Adm-2005/DeMorph

Deepfake Detection Solution using Multimodal Approach.

Language: Python - Size: 10.9 MB - Last synced at: 26 days ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

hubtru/Impala

Expandable Isotropic Multimodal Patch Learning Neural Architecture for the Nano-modal (9) time-series and images data.

Language: Jupyter Notebook - Size: 1.08 GB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

DongmingShenDS/Multi-Modal-ML-Project

A data science project to predict online pet adoption speed using image, natural language, and tabular data with a multi-modal ML framework.

Language: Jupyter Notebook - Size: 17.4 MB - Last synced at: 11 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 0

AndreiMoraru123/ContextCollector

Mixed vision-language Attention Model that gets better by making mistakes

Language: Python - Size: 149 MB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

YuxingLu613/HTML

Code for paper Multiomics dynamic learning enables personalized diagnosis and prognosis for pan-cancer and cancer-subtypes

Language: Python - Size: 43.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

elsobhano/Multimodal-Emotion-Recognition

Multimodal Emotion Recognition using ClipBERT.

Language: Python - Size: 880 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

TIBHannover/MM_Claims

Official code repository for the paper: Gullal Singh Cheema, Sherzod Hakimov, Abdul Sittar, Eric Müller-Budack, Christian Otto, and Ralph Ewerth. 2022. “MM-Claims: A Dataset for Multimodal Claim Detection in Social Media.“ In Findings of the Association for Computational Linguistics: NAACL 2022, pages 962–979, Seattle, United States.

Language: Python - Size: 42.3 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 2

GeorgeTouros/video-soundtrack-evaluation

Create a large, well-managed and clean data-set for the task of music composition for video soundtracks.

Language: Jupyter Notebook - Size: 32.2 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

fabiopernisi/Visual-WSD

This repository contains the code for our solution to the Task 1 of the 17th international workshop about Semantic Evaluation (SemEval-2023)

Language: Jupyter Notebook - Size: 9.35 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

scotthlee/enriched-LSTMs

Classifying multimodal health data with LSTMs

Language: Python - Size: 36.1 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 2

lukereichold/visual-speech-separation

Flask app to demo multimodal deep learning speech separation in videos via TensorFlow Serving

Language: Python - Size: 20.8 MB - Last synced at: 2 days ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

GUT-AI/automated-data-preprocessing

Automated Data Preprocessing

Size: 48.8 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

sutdcv/multi-modal-video-reasoning

[ICCV2021 Workshop] Multi-Modal Video Reasoning and Analyzing Competition

Language: JavaScript - Size: 8.77 MB - Last synced at: 9 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 1

aidotse/multimodal-skin-lesion-classification

Mutlimodality for skin lesions classification

Language: Python - Size: 10.7 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 1

frankaging/generative-physics-inference

Slip or Not? Unsupervised Learning to Understand Physical Scene Using Multimodal Variational Physics Inference Network

Language: Python - Size: 72.3 KB - Last synced at: 2 months ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

floriankulig/neural-navi

Driver CoPilot as a student research project. Using multimodal data-input-streams from a cars telemetry and camera data to try to predict what would be the best drivers' manouver.

Language: Python - Size: 56.8 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

pxxpassi/Allurelle-Skincare-Recommender-App

Recommending users with products based on image processing and external factors to build an inclusive selfcare community

Language: Dart - Size: 34.8 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

Mrkomiljon/awesome-generative-ai

Multimodal generative AI resources : talking heads, STT, TTS, image & video generation, and more.

Size: 2.31 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 1

samsad35/VQ-MAE-AudioVisual-code

A vector quantized masked autoencoder for audiovisual speech emotion recognition

Language: Python - Size: 21.7 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

taco-group/DecAlign

A novel cross-modal decoupling and alignment framework for multimodal representation learning.

Language: JavaScript - Size: 13.8 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

MohamedTharwat21/MemexQA

MemexQA is a project designed to tackle the challenge of real-life multimodal question answering by leveraging both visual and textual data from personal photo albums.

Language: Python - Size: 5.12 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

fevieira27/ImageRecognitionAI-R

R Script for AI Image and Location Recognition that can also generate an automated prompt for AI text-generation of a social media post.

Language: R - Size: 879 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

mbappeenjoyer/GIF-QA

Documentation of the approach employed to tackle the task of GIF Question Answering

Language: Jupyter Notebook - Size: 2.72 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

deepur71/InstructPix2Pix

Implementation of InstructPix2Pix from scratch

Language: Python - Size: 0 Bytes - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

chikap421/mseg_vcuq

This repository accompanies the paper "MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data"

Language: MATLAB - Size: 1.48 GB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

anamabo/SegmentWater

Tools to create output for Paligemma to segment water in satellite images.

Language: Jupyter Notebook - Size: 27.3 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

Vasugi2003/Fusion-AI---MultiModal-Persuvasiveness-Prediction

Developed a system to predict persuasiveness using multi-modal data (text, images, audio). Utilized BERT for text embeddings, ResNet for image features, and Librosa for audio analysis. Fused data from all modalities for enhanced prediction accuracy.

Language: Jupyter Notebook - Size: 770 KB - Last synced at: 26 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

Related Topics
deep-learning 105 multimodal 86 pytorch 64 computer-vision 56 machine-learning 47 multimodal-learning 38 natural-language-processing 26 nlp 24 multimodality 22 vision-and-language 21 tensorflow 20 python 19 large-language-models 19 transformer 16 attention-mechanism 16 transformers 16 multimodal-sentiment-analysis 14 artificial-intelligence 14 llm 14 generative-ai 14 multimodal-large-language-models 13 gpt4 13 self-supervised-learning 13 deep-neural-networks 13 emotion-recognition 12 classification 12 dataset 11 convolutional-neural-networks 11 visual-question-answering 11 attention 10 neural-network 10 multimodal-datasets 10 ai 9 image-processing 9 clip 9 object-detection 8 language-model 8 image-classification 8 bert 8 sentiment-analysis 8 image 8 awesome-list 8 time-series 8 vision-transformer 8 vision-language-transformer 8 multimodal-fusion 7 pytorch-lightning 7 multimodal-representation 7 multimodal-data 7 vision-language 7 representation-learning 7 vision-language-model 7 cnn 7 diffusion-models 7 image-captioning 7 neural-networks 6 huggingface-transformers 6 vision-language-pretraining 6 remote-sensing 6 deeplearning 6 text-to-image 6 keras 6 lstm 6 3d 6 graph-neural-networks 6 reinforcement-learning 6 foundation-models 6 transfer-learning 5 recommender-system 5 embeddings 5 anomaly-detection 5 paper 5 question-answering 5 generative-adversarial-network 5 gan 5 image-generation 5 attention-is-all-you-need 5 variational-autoencoder 5 transformer-models 5 generative-model 5 audio-processing 5 multimodal-interactions 5 nlp-machine-learning 5 contrastive-learning 5 memes 5 point-cloud 5 large-multimodal-models 5 semantic-segmentation 5 python3 5 data-fusion 5 vqa 5 visual-grounding 5 audio 5 speech-recognition 5 text 5 vision-and-language-pre-training 4 cross-modal-retrieval 4 text-classification 4 multimodal-emotion-recognition 4 information-retrieval 4