Topic: "multi-modal-learning"
mlfoundations/open_clip
An open source implementation of CLIP.
Language: Python - Size: 15 MB - Last synced at: 1 day ago - Pushed at: 6 days ago - Stars: 11,589 - Forks: 1,092

OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Language: Python - Size: 2.5 MB - Last synced at: 15 days ago - Pushed at: 9 months ago - Stars: 5,055 - Forks: 492

lyuchenyang/Macaw-LLM
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Language: Python - Size: 35.9 MB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 1,561 - Forks: 124

NVlabs/prismer
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Language: Python - Size: 4.25 MB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 1,310 - Forks: 73

lucidrains/x-clip
A concise but complete implementation of CLIP with various experimental improvements from recent papers
Language: Python - Size: 1.46 MB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 707 - Forks: 47

jokieleung/awesome-visual-question-answering
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Size: 179 KB - Last synced at: 11 days ago - Pushed at: almost 2 years ago - Stars: 662 - Forks: 95

OpenRobotLab/EmbodiedScan
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Language: Python - Size: 23.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 551 - Forks: 40

kyegomez/zeta
Build high-performance AI models with modular building blocks
Language: Python - Size: 41.3 MB - Last synced at: 5 days ago - Pushed at: 9 days ago - Stars: 496 - Forks: 50

DmitryRyumin/CVPR-2023-24-Papers
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!
Language: Python - Size: 10.3 MB - Last synced at: 14 days ago - Pushed at: 9 months ago - Stars: 447 - Forks: 29

zjukg/KG-MM-Survey
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
Size: 82.3 MB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 401 - Forks: 19

zhengli97/PromptKD
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
Language: Python - Size: 11.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 281 - Forks: 3

Ysz2022/NeRCo
[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement
Language: Python - Size: 1.87 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 231 - Forks: 16

huggingface/chug
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
Language: Python - Size: 146 KB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 157 - Forks: 11

moabarar/nemar
[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation
Language: Python - Size: 161 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 153 - Forks: 25

qizekun/ReCon
[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
Language: Python - Size: 1.97 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 142 - Forks: 13

GuanRunwei/Achelous
Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar
Language: Python - Size: 67.3 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 135 - Forks: 7

shikras/d-cube
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
Language: Python - Size: 835 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 115 - Forks: 7

wjun0830/CGDETR
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
Language: Python - Size: 23.3 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 105 - Forks: 11

924973292/EDITOR
【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Language: Python - Size: 10.6 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 77 - Forks: 5

likyoo/Multimodal-Remote-Sensing-Toolkit
A python tool to perform deep learning experiments on multimodal remote sensing data.
Language: Python - Size: 1.01 MB - Last synced at: 11 months ago - Pushed at: about 3 years ago - Stars: 74 - Forks: 12

josedolz/HyperDenseNet_pytorch
Pytorch version of the HyperDenseNet deep neural network for multi-modal image segmentation
Language: Python - Size: 396 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 74 - Forks: 12

rentainhe/TRAR-VQA
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
Language: Python - Size: 927 KB - Last synced at: 12 days ago - Pushed at: over 3 years ago - Stars: 66 - Forks: 18

ttgeng233/UnAV
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
Language: Python - Size: 19.9 MB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 63 - Forks: 6

zhjohnchan/awesome-vision-and-language-pretraining
A curated list of vision-and-language pre-training (VLP). :-)
Size: 125 KB - Last synced at: 11 days ago - Pushed at: almost 3 years ago - Stars: 58 - Forks: 7

RAIVNLab/sugar-crepe
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
Language: Python - Size: 3.33 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 54 - Forks: 7

924973292/Awesome-Multi-Modal-Object-Re-Identification
Welcome to the Awesome Multi-Modal Object Re-Identification Repository! This repository is dedicated to curating and sharing the latest methods, datasets, and resources focused specifically on the domain of multi-modal object re-identification. It brings together cutting-edge research, tools, and papers aimed at advancing the study and application.
Size: 37.1 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 53 - Forks: 3

3dlg-hcvc/DuoduoCLIP
[ICLR 2025] Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Language: Python - Size: 51.5 MB - Last synced at: 18 days ago - Pushed at: about 1 month ago - Stars: 51 - Forks: 3

WillDreamer/Aurora
[NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model
Language: Python - Size: 118 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 48 - Forks: 3

vishalned/MMEarth-data
This repository contains code to download data for the preprint "MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning"
Language: Python - Size: 246 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 47 - Forks: 3

rinnakk/japanese-clip
Japanese CLIP by rinna Co., Ltd.
Language: Python - Size: 574 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 47 - Forks: 2

deep-symbolic-mathematics/Multimodal-Math-Pretraining
[ICLR 2024 Spotlight] This is the official code for the paper "SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training"
Language: Python - Size: 989 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 42 - Forks: 5

Xuchen-Li/llm-arxiv-daily
Automatically update arXiv papers about LLM Reasoning, LLM Evaluation, LLM & MLLM and Video Understanding using Github Actions.
Language: Python - Size: 20.3 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 39 - Forks: 5

richard-peng-xia/HGCLIP
[COLING'25] HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding
Language: Python - Size: 1.69 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 33 - Forks: 1

liyichen-cly/MMEA
MMEA: Entity Alignment for Multi-Modal Knowledge Graphs
Language: Python - Size: 319 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 31 - Forks: 4

YuanGongND/uavm
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
Language: Python - Size: 3.28 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 29 - Forks: 0

Xuchen-Li/cv-arxiv-daily
Automatically update arXiv papers about SOT & VLT, Multi-modal Learning, LLM and Video Understanding using Github Actions.
Language: Python - Size: 30.4 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 28 - Forks: 3

kyegomez/MegaVIT
The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"
Language: Python - Size: 211 KB - Last synced at: 4 days ago - Pushed at: 20 days ago - Stars: 28 - Forks: 1

YunzeMan/Situation3D
[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning
Language: Python - Size: 63.3 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 28 - Forks: 2

RL4M/MRM-pytorch
An official implementation of Advancing Radiograph Representation Learning with Masked Record Modeling (ICLR'23)
Language: Python - Size: 224 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 28 - Forks: 0

filipbasara0/simple-clip
A minimal, but effective implementation of CLIP (Contrastive Language-Image Pretraining) in PyTorch
Language: Jupyter Notebook - Size: 83 KB - Last synced at: 12 days ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 5

depshad/Deep-Learning-Framework-for-Multi-modal-Product-Classification
Code repository for Rakuten Data Challenge: Multimodal Product Classification and Retrieval.
Language: Jupyter Notebook - Size: 174 KB - Last synced at: 19 days ago - Pushed at: almost 4 years ago - Stars: 26 - Forks: 9

peymanbateni/multimodal-emotion-analysis-in-conversations
Multi-model analysis of sentiment and emotion in multi-speaker conversations.
Language: Jupyter Notebook - Size: 36.2 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 24 - Forks: 6

sandipan211/ZSD-SC-Resolver
Resolving semantic confusions for improved zero-shot detection (BMVC 2022)
Language: Python - Size: 77 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 20 - Forks: 4

gaurav104/WSS-CMER
Code for the paper : "Weakly supervised segmentation with cross-modality equivariant constraints", available at https://arxiv.org/pdf/2104.02488.pdf
Language: Python - Size: 656 KB - Last synced at: 9 months ago - Pushed at: over 2 years ago - Stars: 20 - Forks: 3

jackyjsy/SAM-SLR-v2
SAM-SLR-v2 is an improved version of SAM-SLR for sign language recognition.
Language: Python - Size: 191 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 20 - Forks: 4

ivclab/NeuralMerger
Yi-Min Chou, Yi-Ming Chan, Jia-Hong Lee, Chih-Yi Chiu, Chu-Song Chen, "Unifying and Merging Well-trained Deep Neural Networks for Inference Stage," International Joint Conference on Artificial Intelligence (IJCAI), 2018
Language: Python - Size: 18.5 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 20 - Forks: 3

HackerHyper/ACMVH
Adaptive Confidence Multi-View Hashing
Language: Python - Size: 19.5 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 0

chenxi52/FrozenSeg
Open-Vocabulary Panoptic Segmentation
Language: Python - Size: 1.11 MB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 18 - Forks: 1

kyegomez/NeVA
The open source implementation of "NeVA: NeMo Vision and Language Assistant"
Language: Python - Size: 253 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 18 - Forks: 1

924973292/IDEA
【CVPR2025】IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
Language: Python - Size: 34.9 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 17 - Forks: 3

abhrac/xmodal-vit
Official implementation of "Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval", BMVC 2022.
Language: Python - Size: 330 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 1

graphprojects/CM-GCL
Source code of NeurIPS 2022 paper “Co-Modality Graph Contrastive Learning for Imbalanced Node Classification”
Language: Python - Size: 3.58 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 16 - Forks: 2

KnowledgeDiscovery/rca_baselines
Code for "LEMMA-RCA: A Large Multi-modal Multi-domain Dataset for Root Cause Analysis" paper
Language: Python - Size: 1.79 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 14 - Forks: 3

fmenat/MultiviewCropClassification
Public repository of our IGARSS 2023 submission
Language: Python - Size: 132 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 13 - Forks: 1

zhengli97/ATPrompt
Official PyTorch Code for "ATPrompt: Textual Prompt Learning with Embedded Attributes"
Language: Python - Size: 11.2 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 12 - Forks: 0

deep-symbolic-mathematics/Multimodal-Symbolic-Regression
[ICLR 2024 Spotlight] SNIP on Symbolic Regression: Deep Symbolic Regression with Multimodal Pretraining
Language: Python - Size: 1.29 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 12 - Forks: 3

MMintLab/VIRDO
Github repository of a Visio-tactile Implicit Representations of Deformable Objects (ICRA 2022)
Language: Jupyter Notebook - Size: 733 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 1

tudelft-iv/UniBEV
[IVS'24] UniBEV: the official implementation of UniBEV
Language: Python - Size: 12.3 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 10 - Forks: 1

sayakpaul/Multimodal-Entailment-Baseline
This repository shows how to implement a basic model for multimodal entailment.
Language: Jupyter Notebook - Size: 3.17 MB - Last synced at: 23 days ago - Pushed at: over 3 years ago - Stars: 10 - Forks: 4

mailcorahul/auto_labeler
auto_labeler - An all-in-one library to automatically label vision data
Language: Python - Size: 32.2 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 9 - Forks: 1

LooperXX/ManagerTower
Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Language: Python - Size: 6.71 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

Agora-Lab-AI/EKR
Elysium Knowledge Repository is an open source initiative to embed all of Humanity's multi-modal knowledge and wisdom.
Language: Python - Size: 2.15 MB - Last synced at: 20 days ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

raphaelmemmesheimer/gimme_signals_action_recognition
Multi-Modal action recognition for skeleton sequences, inertial measurements, motion capturing data and Wi-Fi CSI fingerprints.
Language: Python - Size: 521 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 2

liveseongho/DramaQA
DramaQA Starter Code (2021)
Language: Python - Size: 69.9 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 3

TianyiFranklinWang/MIRROR
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention
Language: Python - Size: 8.11 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 8 - Forks: 0

v-iashin/CORSMAL
🏆 🏆 Top-1 Submission to CORSMAL Challenge 2020 (at ICPR). The winning solution for the CORSMAL Challenge (on Intelligent Sensing Summer School 2020)
Language: Jupyter Notebook - Size: 752 MB - Last synced at: 6 months ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 4

fpsluozi/tofindwaldo
Official Repo for "To Find Waldo You Need Contextual Cues: Debiasing Who’s Waldo", ACL 2022 (Short)
Size: 205 KB - Last synced at: 6 months ago - Pushed at: about 2 years ago - Stars: 6 - Forks: 0

iCVTEAM/M3TR
M3TR: Multi-modal Multi-label Recognition with Transformer. ACM MM 2021
Language: Python - Size: 1.33 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 2

yihedeng9/STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
Language: Python - Size: 5.98 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 5 - Forks: 0

ArsamAryandoust/UniversalGNNs
Universal graph neural networks for multi-task transfer learning
Language: Python - Size: 8.58 MB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 2

peixinlei/M2HSE
PyTorch code for the paper "Complementarity is the king: A multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval"
Language: Python - Size: 82 KB - Last synced at: 10 months ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

murali1996/nlp-notes
A curated list of papers and experiments in the field of Natural Language Processing (NLP)
Size: 547 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 3

kdhht2334/Hidden_Emotion_Detection_using_MM_Signals
[CHI2021] Hidden emotion detection using multi-modal signals
Language: Python - Size: 48 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 1

JHKim-snu/PGA
[IROS 2024] PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
Language: Python - Size: 34.8 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 4 - Forks: 0

lemma-rca/lemma-rca.github.io
Code for LEMMA-RCA website
Language: HTML - Size: 3.58 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 2

fmenat/optimal-multiview-crop-classifier
Public repository of our work in the search for an optimal multi-view crop classifier (considering encoder architectures and fusion strategies)
Language: Python - Size: 9.35 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 0

MunzerDw/Gen3DQA
My paper (BMVC23) on 3D visual question answering at the lab of Prof. Dr. Niessner at Technical University of Munich.
Language: Python - Size: 16.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

GuanRunwei/VehicleFinder-CTIM
Language: Python - Size: 7.13 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

lyuchenyang/Semantic-aware-VideoQA
Code for ACL SRW 2023 paepr "Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering"
Language: Python - Size: 31.3 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

Hleephilip/MLVU-project
Modality Translation through Conditional Encoder-Decoder (2023-1 Machine Learning for Visual Understanding Team project)
Language: Python - Size: 1.64 MB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

Bekyilma/VA_RecSys
Learning Latent Semantic Representations of Paintings for Personalized Recommendation
Language: PHP - Size: 10.7 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

fmenat/missingviews-study-EO
Public repository of our assessment work in missing views for EO applications
Language: Python - Size: 348 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 2 - Forks: 0

ZINZINBIN/Disruption-Prediciton-based-on-Multimodal-Deep-Learning
Research-repository: Disruption Prediction and Analysis through Multimodal Deep Learning in KSTAR
Language: Jupyter Notebook - Size: 196 MB - Last synced at: 11 days ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

lif314/NeAF
[AAAI 2025] Representing Sounds as Neural Amplitude Fields: A Benchmark of Coordinate-MLPs and A Fourier Kolmogorov-Arnold Framework
Language: Python - Size: 3.85 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

amazon-science/contrastive_emc2
Code the ICML 2024 paper: "EMC^2: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence"
Language: Python - Size: 7.77 MB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 0

WangJingyao07/ST-F2M
🌈 Official Code for **Spatio-Temporal Fuzzy-oriented Multi-modal Meta-learning for Fine-grained Emotion Recognition**
Language: Python - Size: 12.7 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

jianzhnie/MultimodalTransformers
lmmtoolkit is a toolkit for Multi-Modal Learning
Language: Python - Size: 22.5 KB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

lyuchenyang/Efficient-VideoQA
Code for ACL SustaiNLP 2023 paper "Is a Video worth n × n Images? A Highly Efficient Approach to Transformer-based Video Question Answering"
Language: Python - Size: 28.3 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

yookyungkho/Multimodal-Entailment-pytorch
Pytorch Implementation of Multimodal Entailment baseline
Language: Jupyter Notebook - Size: 801 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

talipucar/DomainTranslation
Pytorch implementation of "Multi-domain translation between single-cell imaging and sequencing data using autoencoders" (https://www.nature.com/articles/s41467-020-20249-2) with custom models.
Language: Python - Size: 2.17 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

STiFLeR7/Multi-Modal-Learning-for-Image-and-Text-Analysis
Develops approaches for jointly analyzing images and text using deep learning. Covers applications like image-text matching, visual question answering, image captioning, and sentiment analysis with visual context.
Language: Python - Size: 873 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

LIU42/Contrastive
项目取材自 2024 年 ”泰迪杯“ 数据挖掘挑战赛 B 题,基于共享特征空间对比学习的跨模态图文互检模型
Language: Python - Size: 20.5 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

ammarlodhi255/metadata-augmented-neural-networks-for-wild-animal-classification
This repository contains the implementation code for the paper "Metadata Augmented Neural Networks For Wild Animal Classification": https://www.sciencedirect.com/science/article/pii/S1574954124003479.
Language: Jupyter Notebook - Size: 8.83 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

DFKI-Earth-And-Space-Applications/MVCC_IGARSS
Public repository of our IGARSS 2023 submission
Language: Python - Size: 132 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 1

outta-ai/2023_OUTTA_AIBootcamp_final_project
Language: Python - Size: 1.87 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

joannahong/Visagesyntalk
The video demo of ECCV2022 paper titled "Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection"
Size: 44.5 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Karami-m/Deep-Probabilistic-Multi-View
The code of the paper: M. Karami, D. Schuurmans, "Deep Probabilistic Canonical Correlation Analysis" AAAI 2021
Language: Python - Size: 11.2 MB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

itsShnik/allForOne
PyTorch implementation of the paper: All For One: Multi-modal Multi-Task Learning
Language: Python - Size: 230 KB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

kjanjua26/Do_Cross_Modal_Systems_Leverage_Semantic_Relationships
This is the code for our ICCV'19 paper on cross-modal learning and retrieval.
Size: 3.21 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

eieye/BLN_ABC
A Primer for basic literacy development | Ein "Vorkurs" zur Alphabetisierung in Deutsch als Zweitsprache
Language: JavaScript - Size: 1.5 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0
