Topic: "multimodal-deep-learning"
choyingw/GAIS-Net
CVPR 2020 Workshop on Scalability in Autonomous Driving: GAIS-Net: Geometry-Aware Instance Segmentation with Disparity Maps
Language: Python - Size: 950 KB - Last synced at: 25 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

kyegomez/Gen2
Implementation of "Text driven video generation" in pytorch
Language: Python - Size: 222 KB - Last synced at: 15 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

ofa-x/OFA-X
This repository contains the code for the publication "Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations"
Language: Python - Size: 84.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 0

duyali2000/MQMC
This repo has the PyTorch implementation and datasets of our WSDM 2023 paper: “Multi-queue Momentum Contrast for Microvideo-Product Retrieval”.
Language: Python - Size: 1.97 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 1

ihaeyong/drama-graph
Drama-Graph repository produces both knowledge base on drama scripts and video graph for Video Turing Test (VTT).
Language: Jupyter Notebook - Size: 201 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

candacelax/bias-in-vision-and-language
Code for paper "Measuring Social Biases in Grounded Vision and Language Embeddings"
Language: Shell - Size: 11.7 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

verlab/StraightToThePoint_CVPR_2020
Original PyTorch implementation of the code for the paper "Straight to the Point: Fast-forwarding Videos via Reinforcement Learning Using Textual Data" at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
Language: Python - Size: 27.4 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 8 - Forks: 1

soloist97/region-hierarchical-pytorch
Implementation of a baseline method for image paragraph captioning
Language: Python - Size: 69.2 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 1

talipucar/DomainAdaptation
A model for Domain Adaptation, Alignment and Translation using multiple sources of data.
Language: Python - Size: 46.5 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 1

AI4Patents/IMPACT
IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents (NeurIPS 2024)
Language: Jupyter Notebook - Size: 23.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 7 - Forks: 1

nngocson2002/ViVQA
The Multimodal Model for Vietnamese Visual Question Answering (ViVQA)
Language: Python - Size: 1.02 MB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 7 - Forks: 0

MichiganNLP/visual_diversity_budget
Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost
Size: 2.24 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 7 - Forks: 0

kyegomez/CELESTIAL-1
Omni-Modality Processing, Understanding, and Generation
Language: Python - Size: 2.49 MB - Last synced at: 15 days ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 1

JHKim-snu/GVCCI
[IROS 2023] GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation
Language: Python - Size: 27.4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 0

stevejpapad/image-text-verification
Official repository for the "VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias" paper.
Language: Python - Size: 11.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

justinbt1/Multimodal-Document-Classification
MSc project investigating multi-modal fusion approaches to combining textual and visual features for multi-page classification of documents within the OGA National Data Repository (NDR).
Language: Jupyter Notebook - Size: 11.9 MB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

kyegomez/MMCA-MGQA
Experiments around using Multi-Modal Casual Attention with Multi-Grouped Query Attention
Language: Python - Size: 210 KB - Last synced at: 15 days ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

davide-coccomini/Deepfake-Detection-Challenge-DFAD2023
Implementation of the winning solution for the Media Analytics Challenge 2023.
Language: Jupyter Notebook - Size: 41.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 0

JanTeichertKluge/DMLSim
This library provides packages on DoubleML / Causal Machine Learning and Neural Networks in Python for Simulation and Case Studies.
Language: Python - Size: 145 KB - Last synced at: 22 days ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 0

abs711/The-way-of-the-future
A dataset of egocentric vision, eye-tracking and full body kinematics from human locomotion in out-of-the-lab environments. Also, different use cases of the dataset along with example code.
Language: Python - Size: 48.3 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 0

efthymisgeo/multimodal-masking
This repo contains source code for the MultiModal Masking (M^3) Interspeech 2021 paper.
Language: Python - Size: 1.91 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 0

chikap421/videosam
This repository accompanies the paper "VideoSAM: A Large Vision Foundation Model for High-Speed Video Segmentation"
Language: Jupyter Notebook - Size: 160 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 5 - Forks: 1

kyegomez/Odin
SOTA Classification at scale for UAVs, Drones, and much more
Language: Python - Size: 211 KB - Last synced at: 21 days ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

Cominclip/RPF-Net
Official code for "Recurrent Progressive Fusion-based Learning for Multi-source Remote Sensing Image Classification"
Language: Python - Size: 319 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

saramaxyz/platform
Run custom multi-modal AI models fully on-device
Language: Swift - Size: 14.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

Netherlands-Cancer-Institute/Multimodal_attention_DeepLearning
Multi-modal deep learning with attention mechanism
Language: Python - Size: 2.39 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

gorjanradevski/vsepp_tensorflow
Implementation of "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives" in Tensorflow.
Language: Python - Size: 49.8 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

marialymperaiou/knowledge-enhanced-multimodal-learning
A list of research papers on knowledge-enhanced multimodal learning
Size: 20.5 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

bupt-mmai/S2TD
code for "S2TD: A Tree-Structured Decoder for Image Paragraph Captioning" accepted by MMAsia 2021
Language: Python - Size: 68.7 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 0

cfcooney/BiModNeuroCNN
Package for bimodal training of deep neural networks on neurological data. Pypi: https://pypi.org/project/BiModNeuroCNN/
Language: Python - Size: 137 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 1

xiaoxiaoheimei/SeqDialN
Code for reproducing results in our paper SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space.
Language: Python - Size: 76.2 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 1

gorjanradevski/SMHA
My master thesis: Siamese multi-hop attention for cross-modal retrieval.
Language: Python - Size: 2.76 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 0

nlp-unibo/multimodal-am-fallacy
Multimodal Fallacy Classification in Political Debates: Dataset and Experiments.
Language: Python - Size: 11.2 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

jahez07/Multimodal-Fusion-Strategy-to-Classify-Malware
This work focuses on proposing a novel approach towards classifying malware binaries by extracting visual features from malware executables.
Language: Jupyter Notebook - Size: 257 MB - Last synced at: 26 days ago - Pushed at: 9 months ago - Stars: 4 - Forks: 0

Eva-Kaushik/EMKGCN-MultiModal-Music-Recommender
The `MKGCN` class, coupled with the Spotify API, orchestrates a multi-modal knowledge graph convolutional network to enhance music recommendation systems by integrating user interaction data and diverse music modalities.
Language: Jupyter Notebook - Size: 11 MB - Last synced at: 29 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 1

AdrianBZG/SFAVEL
Code for "Unsupervised Pretraining for Fact Verification by Language Model Distillation" (ICLR 2024)
Language: Python - Size: 14.6 KB - Last synced at: 29 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 0

BorgwardtLab/DeepEST
Language: Python - Size: 396 KB - Last synced at: 20 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 0

marcomoldovan/multimodal-self-distillation
A generalized self-supervised training paradigm for unimodal and multimodal alignment and fusion.
Language: Python - Size: 526 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 2

IsaacRodgz/multimodal-transformers-movies
Experiments with multimodal deep learning models based on transformers
Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 1

Merterm/COSMic
Public repo for the paper: "COSMic: A Coherence-Aware Generation Metric for Image Descriptions" by Mert İnan, Piyush Sharma, Baber Khalid, Radu Soricut, Matthew Stone, Malihe Alikhani
Language: Python - Size: 396 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 0

04mayukh/Memebusters-at-SemEval-2020-Task-8-Memotion-Analysis
This repository contains the code for submission made at SemEval 2020: Task 8 Memotion analysis.
Language: Jupyter Notebook - Size: 55 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 0

IsaacRodgz/Multimodal-Transformer
Multimodal version of transformer for classification using text and image
Language: Python - Size: 2.93 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 4 - Forks: 1

iamdanialkamali/MemotionAnalysis
Meme Sentiment Analysis SemEval 2020 Task 9
Language: Jupyter Notebook - Size: 1.93 MB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 0

GerrySant/multimodalhugs
MultimodalHugs is an extension of Hugging Face that offers a generalized framework for training, evaluating, and using multimodal AI models with minimal code differences, ensuring seamless compatibility with Hugging Face pipelines.
Language: Python - Size: 4.24 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3 - Forks: 2

stevejpapad/relevant-evidence-detection
Official repository for the "RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection" paper.
Language: Python - Size: 40.7 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 3 - Forks: 2

icedpanda/COMPASS-official
Official Implementation of Unveiling User Preferences: A Knowledge Graph and LLM-Driven Approach for Conversational Recommendation
Language: Python - Size: 82.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

Davidlequnchen/Awesome-AM-process-monitoring-control
A curated collection of research papers with open-source implementations/datasets focused on in-situ process monitoring and adaptive control in laser-based additive manufacturing.
Size: 57.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

Davidlequnchen/MultiSensorFusion-ROS-AM-Monitoring
ROS-based Multisensor Fusion Digital Twin (MFDT) platform for real-time monitoring and defect detection of Laser-Directed Energy Deposition (L-DED) Additive Manufacturing (AM) process.
Language: HTML - Size: 3.8 GB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 1

fork123aniket/Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot
Streamlit App Combining Vision, Language, and Audio AI Models
Language: Python - Size: 18.6 KB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

hubtru/Minape
Multimodal Isotropic Neural Architecture with Patch Embedding to both time series and image data for classification purposes.
Language: Jupyter Notebook - Size: 47 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 3 - Forks: 1

deepmancer/deepmancer
"When in doubt, use brute force." - Ken Thompson
Size: 429 KB - Last synced at: 7 months ago - Pushed at: 8 months ago - Stars: 3 - Forks: 0

RunyuFan/FusionMixer-TGRS-2022
Code for TGRS 2022 paper "Multilevel Spatial-Channel Feature Fusion Network for Urban Village Classification by Fusing Satellite and Streetview Images"
Language: Python - Size: 51.8 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 3 - Forks: 2

ibnaleem/mikael
a Discord chatbot trained on Mistral and LLaVA language models
Language: Python - Size: 3.53 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 1

gustavocidornelas/fused-multimodal-emotion
Multimodal emotion recognition using lexico-acoustic language descriptions
Language: Python - Size: 37.3 MB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 1

samyuh/seadronessee-metadata-adaptation
Exploring Metadata in Neural Networks for UAV Maritime Surveillance
Language: Jupyter Notebook - Size: 34.5 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

Fuzzytariy/CMF-DGCN
A Chinese Sentiment Analysis Model based on Transmembrane State Attention for Modal Fusion and Multimodal Dynamic Gradient Regulation.
Language: Python - Size: 4.04 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

liuzwin98/DSCMT
code released
Language: Python - Size: 64.5 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

carlosholivan/AudioGenerationDiffusion
State-of-the-art of Audio Generation with Diffusion Models
Size: 179 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

danadascalescu00/MultimodalOpinionAnalysis 📦
Bachelor Thesis: Opinion Polarity Classification - Given a tweet consisting of an image and text, classify the post on three-point scale
Language: Python - Size: 101 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

El-Zag/Multimodal-Video-Captioning
Master Thesis on Multimodal Video Captioning, done at Huawei's Research Center in Amsterdam.
Language: Python - Size: 2.3 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

celestialxevermore/CLIP2AE
AI-multimodal : Modeling the new text - video retrieval framework
Language: Jupyter Notebook - Size: 1.68 GB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 1

comp-well-org/More2Less
More to Less (M2L): Enhanced Health Recognition in the Wild with Reduced Modality of Wearable Sensors
Language: Python - Size: 1.03 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

koushikvikram/multimodal-image-retrieval
📝🔍🖼️ A deep learning application for retrieving images by searching with text.
Language: Jupyter Notebook - Size: 382 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

mazleon/Hateful_Meme_Challenge
Hateful Memes dataset contains real hate speech. The Real Hateful Memes dataset consists of more than 10,000 newly created examples by Facebook AI.
Language: Jupyter Notebook - Size: 1.38 MB - Last synced at: 1 day ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 0

licesonw/deepmm
Multimodal deep learning package that uses both categorical and text-based features in a single deep architecture for regression and binary classification use cases.
Language: Python - Size: 385 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 0

IsaacRodgz/GMU-Baseline
Replication of models and results obtained in "Gated multimodal networks" paper
Language: Python - Size: 53.4 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 0

gorjanradevski/cross_modal_full_transfer
PyTorch code for cross-modal-retrieval on Flickr8k/30k using Bert and EfficientNet
Language: Python - Size: 72.3 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 3 - Forks: 1

Demfier/pmup
App to cheer you up with some awesome quotes when depressed using deep learning
Language: Python - Size: 17.5 MB - Last synced at: 3 days ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 0

mbaqer/V2X-mmWave-Beamforming
PyTorch implementation of multi-modality sensing in 60 GHz mmWave beamforming for connected vehicles.
Language: Jupyter Notebook - Size: 5.09 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 2 - Forks: 0

brian-cy-chang/Multimodal_VB-Fracture-Detector
An easy-to-use framework for multimodal models to detect vertebral body fractures in PyTorch
Language: Python - Size: 1.82 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 2 - Forks: 0

icon-lab/MedTrim
Official implementation of "Meta-Entity Driven Triplet Mining for Aligning Medical Vision-Language Models"
Language: Python - Size: 40 KB - Last synced at: 20 days ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 1

fork123aniket/Agentic-RAG-Story-Generation-with-Multimodal-GenAI
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
Language: Python - Size: 94.7 KB - Last synced at: 28 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

FIVEYOUNGWOO/WiFiMobNet
WiFi-Camera multimodal learning-based object detection and pose estimation.
Language: Python - Size: 560 MB - Last synced at: 2 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 1

Adm-2005/DeMorph
Deepfake Detection Solution using Multimodal Approach.
Language: Python - Size: 10.9 MB - Last synced at: 26 days ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

hubtru/Impala
Expandable Isotropic Multimodal Patch Learning Neural Architecture for the Nano-modal (9) time-series and images data.
Language: Jupyter Notebook - Size: 1.08 GB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

DongmingShenDS/Multi-Modal-ML-Project
A data science project to predict online pet adoption speed using image, natural language, and tabular data with a multi-modal ML framework.
Language: Jupyter Notebook - Size: 17.4 MB - Last synced at: 11 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 0

AndreiMoraru123/ContextCollector
Mixed vision-language Attention Model that gets better by making mistakes
Language: Python - Size: 149 MB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

YuxingLu613/HTML
Code for paper Multiomics dynamic learning enables personalized diagnosis and prognosis for pan-cancer and cancer-subtypes
Language: Python - Size: 43.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

elsobhano/Multimodal-Emotion-Recognition
Multimodal Emotion Recognition using ClipBERT.
Language: Python - Size: 880 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

TIBHannover/MM_Claims
Official code repository for the paper: Gullal Singh Cheema, Sherzod Hakimov, Abdul Sittar, Eric Müller-Budack, Christian Otto, and Ralph Ewerth. 2022. “MM-Claims: A Dataset for Multimodal Claim Detection in Social Media.“ In Findings of the Association for Computational Linguistics: NAACL 2022, pages 962–979, Seattle, United States.
Language: Python - Size: 42.3 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 2

GeorgeTouros/video-soundtrack-evaluation
Create a large, well-managed and clean data-set for the task of music composition for video soundtracks.
Language: Jupyter Notebook - Size: 32.2 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

fabiopernisi/Visual-WSD
This repository contains the code for our solution to the Task 1 of the 17th international workshop about Semantic Evaluation (SemEval-2023)
Language: Jupyter Notebook - Size: 9.35 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

scotthlee/enriched-LSTMs
Classifying multimodal health data with LSTMs
Language: Python - Size: 36.1 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 2

lukereichold/visual-speech-separation
Flask app to demo multimodal deep learning speech separation in videos via TensorFlow Serving
Language: Python - Size: 20.8 MB - Last synced at: 2 days ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

GUT-AI/automated-data-preprocessing
Automated Data Preprocessing
Size: 48.8 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

sutdcv/multi-modal-video-reasoning
[ICCV2021 Workshop] Multi-Modal Video Reasoning and Analyzing Competition
Language: JavaScript - Size: 8.77 MB - Last synced at: 9 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 1

aidotse/multimodal-skin-lesion-classification
Mutlimodality for skin lesions classification
Language: Python - Size: 10.7 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 1

frankaging/generative-physics-inference
Slip or Not? Unsupervised Learning to Understand Physical Scene Using Multimodal Variational Physics Inference Network
Language: Python - Size: 72.3 KB - Last synced at: 2 months ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

floriankulig/neural-navi
Driver CoPilot as a student research project. Using multimodal data-input-streams from a cars telemetry and camera data to try to predict what would be the best drivers' manouver.
Language: Python - Size: 56.8 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

pxxpassi/Allurelle-Skincare-Recommender-App
Recommending users with products based on image processing and external factors to build an inclusive selfcare community
Language: Dart - Size: 34.8 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

Mrkomiljon/awesome-generative-ai
Multimodal generative AI resources : talking heads, STT, TTS, image & video generation, and more.
Size: 2.31 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 1

samsad35/VQ-MAE-AudioVisual-code
A vector quantized masked autoencoder for audiovisual speech emotion recognition
Language: Python - Size: 21.7 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

taco-group/DecAlign
A novel cross-modal decoupling and alignment framework for multimodal representation learning.
Language: JavaScript - Size: 13.8 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

MohamedTharwat21/MemexQA
MemexQA is a project designed to tackle the challenge of real-life multimodal question answering by leveraging both visual and textual data from personal photo albums.
Language: Python - Size: 5.12 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

fevieira27/ImageRecognitionAI-R
R Script for AI Image and Location Recognition that can also generate an automated prompt for AI text-generation of a social media post.
Language: R - Size: 879 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

mbappeenjoyer/GIF-QA
Documentation of the approach employed to tackle the task of GIF Question Answering
Language: Jupyter Notebook - Size: 2.72 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

deepur71/InstructPix2Pix
Implementation of InstructPix2Pix from scratch
Language: Python - Size: 0 Bytes - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

chikap421/mseg_vcuq
This repository accompanies the paper "MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data"
Language: MATLAB - Size: 1.48 GB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

anamabo/SegmentWater
Tools to create output for Paligemma to segment water in satellite images.
Language: Jupyter Notebook - Size: 27.3 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

Vasugi2003/Fusion-AI---MultiModal-Persuvasiveness-Prediction
Developed a system to predict persuasiveness using multi-modal data (text, images, audio). Utilized BERT for text embeddings, ResNet for image features, and Librosa for audio analysis. Fused data from all modalities for enhanced prediction accuracy.
Language: Jupyter Notebook - Size: 770 KB - Last synced at: 26 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0
