An open API service providing repository metadata for many open source software ecosystems.

Topic: "multi-modal-learning"

mlfoundations/open_clip

An open source implementation of CLIP.

Language: Python - Size: 15 MB - Last synced at: 1 day ago - Pushed at: 6 days ago - Stars: 11,589 - Forks: 1,092

OFA-Sys/Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Language: Python - Size: 2.5 MB - Last synced at: 15 days ago - Pushed at: 9 months ago - Stars: 5,055 - Forks: 492

lyuchenyang/Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Language: Python - Size: 35.9 MB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 1,561 - Forks: 124

NVlabs/prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Language: Python - Size: 4.25 MB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 1,310 - Forks: 73

lucidrains/x-clip

A concise but complete implementation of CLIP with various experimental improvements from recent papers

Language: Python - Size: 1.46 MB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 707 - Forks: 47

jokieleung/awesome-visual-question-answering

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

Size: 179 KB - Last synced at: 11 days ago - Pushed at: almost 2 years ago - Stars: 662 - Forks: 95

OpenRobotLab/EmbodiedScan

[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Language: Python - Size: 23.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 551 - Forks: 40

kyegomez/zeta

Build high-performance AI models with modular building blocks

Language: Python - Size: 41.3 MB - Last synced at: 5 days ago - Pushed at: 9 days ago - Stars: 496 - Forks: 50

DmitryRyumin/CVPR-2023-24-Papers

CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!

Language: Python - Size: 10.3 MB - Last synced at: 14 days ago - Pushed at: 9 months ago - Stars: 447 - Forks: 29

zjukg/KG-MM-Survey

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

Size: 82.3 MB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 401 - Forks: 19

zhengli97/PromptKD

[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"

Language: Python - Size: 11.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 281 - Forks: 3

Ysz2022/NeRCo

[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement

Language: Python - Size: 1.87 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 231 - Forks: 16

huggingface/chug

Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.

Language: Python - Size: 146 KB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 157 - Forks: 11

moabarar/nemar

[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

Language: Python - Size: 161 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 153 - Forks: 25

qizekun/ReCon

[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

Language: Python - Size: 1.97 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 142 - Forks: 13

GuanRunwei/Achelous

Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar

Language: Python - Size: 67.3 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 135 - Forks: 7

shikras/d-cube

A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).

Language: Python - Size: 835 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 115 - Forks: 7

wjun0830/CGDETR

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

Language: Python - Size: 23.3 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 105 - Forks: 11

924973292/EDITOR

【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Language: Python - Size: 10.6 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 77 - Forks: 5

likyoo/Multimodal-Remote-Sensing-Toolkit

A python tool to perform deep learning experiments on multimodal remote sensing data.

Language: Python - Size: 1.01 MB - Last synced at: 11 months ago - Pushed at: about 3 years ago - Stars: 74 - Forks: 12

josedolz/HyperDenseNet_pytorch

Pytorch version of the HyperDenseNet deep neural network for multi-modal image segmentation

Language: Python - Size: 396 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 74 - Forks: 12

rentainhe/TRAR-VQA

[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"

Language: Python - Size: 927 KB - Last synced at: 12 days ago - Pushed at: over 3 years ago - Stars: 66 - Forks: 18

ttgeng233/UnAV

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

Language: Python - Size: 19.9 MB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 63 - Forks: 6

zhjohnchan/awesome-vision-and-language-pretraining

A curated list of vision-and-language pre-training (VLP). :-)

Size: 125 KB - Last synced at: 11 days ago - Pushed at: almost 3 years ago - Stars: 58 - Forks: 7

RAIVNLab/sugar-crepe

[NeurIPS 2023] A faithful benchmark for vision-language compositionality

Language: Python - Size: 3.33 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 54 - Forks: 7

924973292/Awesome-Multi-Modal-Object-Re-Identification

Welcome to the Awesome Multi-Modal Object Re-Identification Repository! This repository is dedicated to curating and sharing the latest methods, datasets, and resources focused specifically on the domain of multi-modal object re-identification. It brings together cutting-edge research, tools, and papers aimed at advancing the study and application.

Size: 37.1 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 53 - Forks: 3

3dlg-hcvc/DuoduoCLIP

[ICLR 2025] Duoduo CLIP: Efficient 3D Understanding with Multi-View Images

Language: Python - Size: 51.5 MB - Last synced at: 18 days ago - Pushed at: about 1 month ago - Stars: 51 - Forks: 3

WillDreamer/Aurora

[NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model

Language: Python - Size: 118 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 48 - Forks: 3

vishalned/MMEarth-data

This repository contains code to download data for the preprint "MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning"

Language: Python - Size: 246 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 47 - Forks: 3

rinnakk/japanese-clip

Japanese CLIP by rinna Co., Ltd.

Language: Python - Size: 574 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 47 - Forks: 2

deep-symbolic-mathematics/Multimodal-Math-Pretraining

[ICLR 2024 Spotlight] This is the official code for the paper "SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training"

Language: Python - Size: 989 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 42 - Forks: 5

Xuchen-Li/llm-arxiv-daily

Automatically update arXiv papers about LLM Reasoning, LLM Evaluation, LLM & MLLM and Video Understanding using Github Actions.

Language: Python - Size: 20.3 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 39 - Forks: 5

richard-peng-xia/HGCLIP

[COLING'25] HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding

Language: Python - Size: 1.69 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 33 - Forks: 1

liyichen-cly/MMEA

MMEA: Entity Alignment for Multi-Modal Knowledge Graphs

Language: Python - Size: 319 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 31 - Forks: 4

YuanGongND/uavm

Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".

Language: Python - Size: 3.28 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 29 - Forks: 0

Xuchen-Li/cv-arxiv-daily

Automatically update arXiv papers about SOT & VLT, Multi-modal Learning, LLM and Video Understanding using Github Actions.

Language: Python - Size: 30.4 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 28 - Forks: 3

kyegomez/MegaVIT

The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"

Language: Python - Size: 211 KB - Last synced at: 4 days ago - Pushed at: 20 days ago - Stars: 28 - Forks: 1

YunzeMan/Situation3D

[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning

Language: Python - Size: 63.3 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 28 - Forks: 2

RL4M/MRM-pytorch

An official implementation of Advancing Radiograph Representation Learning with Masked Record Modeling (ICLR'23)

Language: Python - Size: 224 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 28 - Forks: 0

filipbasara0/simple-clip

A minimal, but effective implementation of CLIP (Contrastive Language-Image Pretraining) in PyTorch

Language: Jupyter Notebook - Size: 83 KB - Last synced at: 12 days ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 5

depshad/Deep-Learning-Framework-for-Multi-modal-Product-Classification

Code repository for Rakuten Data Challenge: Multimodal Product Classification and Retrieval.

Language: Jupyter Notebook - Size: 174 KB - Last synced at: 19 days ago - Pushed at: almost 4 years ago - Stars: 26 - Forks: 9

peymanbateni/multimodal-emotion-analysis-in-conversations

Multi-model analysis of sentiment and emotion in multi-speaker conversations.

Language: Jupyter Notebook - Size: 36.2 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 24 - Forks: 6

sandipan211/ZSD-SC-Resolver

Resolving semantic confusions for improved zero-shot detection (BMVC 2022)

Language: Python - Size: 77 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 20 - Forks: 4

gaurav104/WSS-CMER

Code for the paper : "Weakly supervised segmentation with cross-modality equivariant constraints", available at https://arxiv.org/pdf/2104.02488.pdf

Language: Python - Size: 656 KB - Last synced at: 9 months ago - Pushed at: over 2 years ago - Stars: 20 - Forks: 3

jackyjsy/SAM-SLR-v2

SAM-SLR-v2 is an improved version of SAM-SLR for sign language recognition.

Language: Python - Size: 191 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 20 - Forks: 4

ivclab/NeuralMerger

Yi-Min Chou, Yi-Ming Chan, Jia-Hong Lee, Chih-Yi Chiu, Chu-Song Chen, "Unifying and Merging Well-trained Deep Neural Networks for Inference Stage," International Joint Conference on Artificial Intelligence (IJCAI), 2018

Language: Python - Size: 18.5 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 20 - Forks: 3

HackerHyper/ACMVH

Adaptive Confidence Multi-View Hashing

Language: Python - Size: 19.5 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 0

chenxi52/FrozenSeg

Open-Vocabulary Panoptic Segmentation

Language: Python - Size: 1.11 MB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 18 - Forks: 1

kyegomez/NeVA

The open source implementation of "NeVA: NeMo Vision and Language Assistant"

Language: Python - Size: 253 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 18 - Forks: 1

924973292/IDEA

【CVPR2025】IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification

Language: Python - Size: 34.9 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 17 - Forks: 3

abhrac/xmodal-vit

Official implementation of "Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval", BMVC 2022.

Language: Python - Size: 330 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 1

graphprojects/CM-GCL

Source code of NeurIPS 2022 paper “Co-Modality Graph Contrastive Learning for Imbalanced Node Classification”

Language: Python - Size: 3.58 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 16 - Forks: 2

KnowledgeDiscovery/rca_baselines

Code for "LEMMA-RCA: A Large Multi-modal Multi-domain Dataset for Root Cause Analysis" paper

Language: Python - Size: 1.79 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 14 - Forks: 3

fmenat/MultiviewCropClassification

Public repository of our IGARSS 2023 submission

Language: Python - Size: 132 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 13 - Forks: 1

zhengli97/ATPrompt

Official PyTorch Code for "ATPrompt: Textual Prompt Learning with Embedded Attributes"

Language: Python - Size: 11.2 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 12 - Forks: 0

deep-symbolic-mathematics/Multimodal-Symbolic-Regression

[ICLR 2024 Spotlight] SNIP on Symbolic Regression: Deep Symbolic Regression with Multimodal Pretraining

Language: Python - Size: 1.29 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 12 - Forks: 3

MMintLab/VIRDO

Github repository of a Visio-tactile Implicit Representations of Deformable Objects (ICRA 2022)

Language: Jupyter Notebook - Size: 733 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 1

tudelft-iv/UniBEV

[IVS'24] UniBEV: the official implementation of UniBEV

Language: Python - Size: 12.3 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 10 - Forks: 1

sayakpaul/Multimodal-Entailment-Baseline

This repository shows how to implement a basic model for multimodal entailment.

Language: Jupyter Notebook - Size: 3.17 MB - Last synced at: 23 days ago - Pushed at: over 3 years ago - Stars: 10 - Forks: 4

mailcorahul/auto_labeler

auto_labeler - An all-in-one library to automatically label vision data

Language: Python - Size: 32.2 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 9 - Forks: 1

LooperXX/ManagerTower

Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

Language: Python - Size: 6.71 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

Agora-Lab-AI/EKR

Elysium Knowledge Repository is an open source initiative to embed all of Humanity's multi-modal knowledge and wisdom.

Language: Python - Size: 2.15 MB - Last synced at: 20 days ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

raphaelmemmesheimer/gimme_signals_action_recognition

Multi-Modal action recognition for skeleton sequences, inertial measurements, motion capturing data and Wi-Fi CSI fingerprints.

Language: Python - Size: 521 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 2

liveseongho/DramaQA

DramaQA Starter Code (2021)

Language: Python - Size: 69.9 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 3

TianyiFranklinWang/MIRROR

MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention

Language: Python - Size: 8.11 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 8 - Forks: 0

v-iashin/CORSMAL

🏆 🏆 Top-1 Submission to CORSMAL Challenge 2020 (at ICPR). The winning solution for the CORSMAL Challenge (on Intelligent Sensing Summer School 2020)

Language: Jupyter Notebook - Size: 752 MB - Last synced at: 6 months ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 4

fpsluozi/tofindwaldo

Official Repo for "To Find Waldo You Need Contextual Cues: Debiasing Who’s Waldo", ACL 2022 (Short)

Size: 205 KB - Last synced at: 6 months ago - Pushed at: about 2 years ago - Stars: 6 - Forks: 0

iCVTEAM/M3TR

M3TR: Multi-modal Multi-label Recognition with Transformer. ACM MM 2021

Language: Python - Size: 1.33 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 2

yihedeng9/STIC

Enhancing Large Vision Language Models with Self-Training on Image Comprehension.

Language: Python - Size: 5.98 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 5 - Forks: 0

ArsamAryandoust/UniversalGNNs

Universal graph neural networks for multi-task transfer learning

Language: Python - Size: 8.58 MB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 2

peixinlei/M2HSE

PyTorch code for the paper "Complementarity is the king: A multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval"

Language: Python - Size: 82 KB - Last synced at: 10 months ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

murali1996/nlp-notes

A curated list of papers and experiments in the field of Natural Language Processing (NLP)

Size: 547 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 3

kdhht2334/Hidden_Emotion_Detection_using_MM_Signals

[CHI2021] Hidden emotion detection using multi-modal signals

Language: Python - Size: 48 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 1

JHKim-snu/PGA

[IROS 2024] PGA: Personalizing Grasping Agents with Single Human-Robot Interaction

Language: Python - Size: 34.8 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 4 - Forks: 0

lemma-rca/lemma-rca.github.io

Code for LEMMA-RCA website

Language: HTML - Size: 3.58 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 2

fmenat/optimal-multiview-crop-classifier

Public repository of our work in the search for an optimal multi-view crop classifier (considering encoder architectures and fusion strategies)

Language: Python - Size: 9.35 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 0

MunzerDw/Gen3DQA

My paper (BMVC23) on 3D visual question answering at the lab of Prof. Dr. Niessner at Technical University of Munich.

Language: Python - Size: 16.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

GuanRunwei/VehicleFinder-CTIM

Language: Python - Size: 7.13 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

lyuchenyang/Semantic-aware-VideoQA

Code for ACL SRW 2023 paepr "Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering"

Language: Python - Size: 31.3 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

Hleephilip/MLVU-project

Modality Translation through Conditional Encoder-Decoder (2023-1 Machine Learning for Visual Understanding Team project)

Language: Python - Size: 1.64 MB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

Bekyilma/VA_RecSys

Learning Latent Semantic Representations of Paintings for Personalized Recommendation

Language: PHP - Size: 10.7 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

fmenat/missingviews-study-EO

Public repository of our assessment work in missing views for EO applications

Language: Python - Size: 348 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 2 - Forks: 0

ZINZINBIN/Disruption-Prediciton-based-on-Multimodal-Deep-Learning

Research-repository: Disruption Prediction and Analysis through Multimodal Deep Learning in KSTAR

Language: Jupyter Notebook - Size: 196 MB - Last synced at: 11 days ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

lif314/NeAF

[AAAI 2025] Representing Sounds as Neural Amplitude Fields: A Benchmark of Coordinate-MLPs and A Fourier Kolmogorov-Arnold Framework

Language: Python - Size: 3.85 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

amazon-science/contrastive_emc2

Code the ICML 2024 paper: "EMC^2: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence"

Language: Python - Size: 7.77 MB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 0

WangJingyao07/ST-F2M

🌈 Official Code for **Spatio-Temporal Fuzzy-oriented Multi-modal Meta-learning for Fine-grained Emotion Recognition**

Language: Python - Size: 12.7 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

jianzhnie/MultimodalTransformers

lmmtoolkit is a toolkit for Multi-Modal Learning

Language: Python - Size: 22.5 KB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

lyuchenyang/Efficient-VideoQA

Code for ACL SustaiNLP 2023 paper "Is a Video worth n × n Images? A Highly Efficient Approach to Transformer-based Video Question Answering"

Language: Python - Size: 28.3 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

yookyungkho/Multimodal-Entailment-pytorch

Pytorch Implementation of Multimodal Entailment baseline

Language: Jupyter Notebook - Size: 801 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

talipucar/DomainTranslation

Pytorch implementation of "Multi-domain translation between single-cell imaging and sequencing data using autoencoders" (https://www.nature.com/articles/s41467-020-20249-2) with custom models.

Language: Python - Size: 2.17 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

STiFLeR7/Multi-Modal-Learning-for-Image-and-Text-Analysis

Develops approaches for jointly analyzing images and text using deep learning. Covers applications like image-text matching, visual question answering, image captioning, and sentiment analysis with visual context.

Language: Python - Size: 873 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

LIU42/Contrastive

项目取材自 2024 年 ”泰迪杯“ 数据挖掘挑战赛 B 题,基于共享特征空间对比学习的跨模态图文互检模型

Language: Python - Size: 20.5 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

ammarlodhi255/metadata-augmented-neural-networks-for-wild-animal-classification

This repository contains the implementation code for the paper "Metadata Augmented Neural Networks For Wild Animal Classification": https://www.sciencedirect.com/science/article/pii/S1574954124003479.

Language: Jupyter Notebook - Size: 8.83 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

DFKI-Earth-And-Space-Applications/MVCC_IGARSS

Public repository of our IGARSS 2023 submission

Language: Python - Size: 132 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 1

outta-ai/2023_OUTTA_AIBootcamp_final_project

Language: Python - Size: 1.87 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

joannahong/Visagesyntalk

The video demo of ECCV2022 paper titled "Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection"

Size: 44.5 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Karami-m/Deep-Probabilistic-Multi-View

The code of the paper: M. Karami, D. Schuurmans, "Deep Probabilistic Canonical Correlation Analysis" AAAI 2021

Language: Python - Size: 11.2 MB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

itsShnik/allForOne

PyTorch implementation of the paper: All For One: Multi-modal Multi-Task Learning

Language: Python - Size: 230 KB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

kjanjua26/Do_Cross_Modal_Systems_Leverage_Semantic_Relationships

This is the code for our ICCV'19 paper on cross-modal learning and retrieval.

Size: 3.21 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

eieye/BLN_ABC

A Primer for basic literacy development | Ein "Vorkurs" zur Alphabetisierung in Deutsch als Zweitsprache

Language: JavaScript - Size: 1.5 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

Related Topics
deep-learning 37 pytorch 20 computer-vision 16 multi-modal 15 machine-learning 10 vision-and-language 7 nlp 7 artificial-intelligence 7 multi-view-learning 7 contrastive-learning 6 representation-learning 6 transformers 6 clip 6 multi-modal-fusion 6 vision-language-model 6 self-supervised-learning 5 language-model 5 remote-sensing 5 multi-task-learning 5 multimodal-learning 4 object-detection 4 natural-language-processing 4 vision-language 4 image-classification 3 segmentation 3 gpt4 3 tensorflow 3 pretrained-models 3 cross-modal-retrieval 3 crop-classification 3 cvpr2024 3 large-language-models 3 deep-neural-networks 3 zero-shot-classification 3 text-to-image 3 visual-question-answering 3 bert 3 video-question-answering 3 dataset 2 multi-modal-imaging 2 medical-image-processing 2 symbolic-regression 2 ai4science 2 datasets 2 ai4math 2 transfer-learning 2 instance-segmentation 2 speech-recognition 2 graph-neural-networks 2 paper-list 2 sensor-fusion 2 vision-transformer 2 causal-discovery 2 root-cause-analysis 2 zero-shot-learning 2 multiview-learning 2 multisensor-fusion 2 attngan 2 datafusion 2 data-fusion 2 croptypes 2 crop-type-mapping 2 lafite 2 agriculture-research 2 transformer 2 vehicle-reidentification 2 openai-clip 2 pytorch-lightning 2 vision-language-learning 2 stackgan 2 reid 2 iccv 2 vision-and-language-pre-training 2 autonomous-driving 2 action-recognition 2 entity-alignment 2 vqa 2 multimodal 2 knowledge-graph 2 sentiment-classification 2 prompt-learning 2 arxiv-daily 2 robotics 2 earth-observation 2 knowledge-distillation 2 contrastive-loss 2 image-text-retrieval 2 robustness 2 cnn 2 msvr310 1 frequency-analysis 1 person-reidentification 1 person-reid 1 audio-visual-learning 1 mcmc-sampling 1 llava 1 audio-visual-events 1 latent-space-interpolation 1 llm-finetuning 1 equation-discovery 1