An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: multi-modal-learning

XxabueloxX/Vision-Matters

Vision Matters explores how simple visual changes can enhance multimodal math reasoning. Join the discussion and contribute to the project! 👩💻👨💻

Language: Python - Size: 15.9 MB - Last synced at: 25 minutes ago - Pushed at: about 2 hours ago - Stars: 0 - Forks: 0

aneeuk/four-sided-triangle

A sophisticated multi-model optimization pipeline for domain-expert knowledge extraction RAG systems

Language: Python - Size: 6.57 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

fullscreen-triangle/four-sided-triangle

A sophisticated multi-model optimization pipeline for domain-expert knowledge extraction RAG systems

Language: Python - Size: 6.73 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 8 - Forks: 0

MosesTheRedSea/SOLID

Spatial Object Learning with Integrated Dimensions

Size: 15.6 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

chenxi52/CMPF

Open-Vocabulary Panoptic Segmentation

Language: Python - Size: 1.12 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 24 - Forks: 1

kyegomez/zeta

Build high-performance AI models with modular building blocks

Language: Python - Size: 41.3 MB - Last synced at: 9 days ago - Pushed at: 12 days ago - Stars: 525 - Forks: 52

924973292/Awesome-Multi-Modal-Object-Re-Identification

Welcome to the Awesome Multi-Modal Object Re-Identification Repository! This repository is dedicated to curating and sharing the latest methods, datasets, and resources focused specifically on the domain of multi-modal object re-identification. It brings together cutting-edge research, tools, and papers aimed at advancing the study and application.

Size: 38.1 KB - Last synced at: 4 days ago - Pushed at: 26 days ago - Stars: 61 - Forks: 5

zhengli97/PromptKD

[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"

Language: Python - Size: 11.2 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 315 - Forks: 6

UNITES-Lab/Flex-MoE

[NeurIPS 2024 Spotlight] Code for the paper "Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts"

Language: Python - Size: 3.14 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 52 - Forks: 2

Awais-Asghar/Skin-Cancer-Binary-Classifier

A machine learning project for binary classification of skin cancer as malignant or benign, utilizing models like XGBoost, LGBM Classifier, Adaboost, SVM, and Logistic Regression. Features comprehensive data preprocessing, model training, and evaluation for accurate diagnosis.

Language: Jupyter Notebook - Size: 5.65 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

EfiLygda/Multi-Modal-Deep-Learning-Model-for-Alzheimer-s-Disease-Classification

Multi Modal Deep Learning Model used for Alzheimer's Disease Classification.

Language: Python - Size: 839 KB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

3dlg-hcvc/DuoduoCLIP

[ICLR 2025] Duoduo CLIP: Efficient 3D Understanding with Multi-View Images

Language: Python - Size: 64.2 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 57 - Forks: 4

mlfoundations/open_clip

An open source implementation of CLIP.

Language: Python - Size: 15 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 11,830 - Forks: 1,110

lyuchenyang/Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Language: Python - Size: 35.9 MB - Last synced at: 24 days ago - Pushed at: 6 months ago - Stars: 1,568 - Forks: 124

lemma-rca/lemma-rca.github.io

Code for LEMMA-RCA website

Language: HTML - Size: 3.66 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 3 - Forks: 2

shikras/d-cube

A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).

Language: Python - Size: 835 KB - Last synced at: 25 days ago - Pushed at: over 1 year ago - Stars: 123 - Forks: 7

OFA-Sys/Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Language: Python - Size: 2.5 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 5,203 - Forks: 497

DmitryRyumin/CVPR-2023-24-Papers

CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!

Language: Python - Size: 10.3 MB - Last synced at: 20 days ago - Pushed at: 11 months ago - Stars: 451 - Forks: 30

Agora-Lab-AI/EKR

Elysium Knowledge Repository is an open source initiative to embed all of Humanity's multi-modal knowledge and wisdom.

Language: Python - Size: 2.15 MB - Last synced at: 21 days ago - Pushed at: almost 2 years ago - Stars: 8 - Forks: 1

NVlabs/prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Language: Python - Size: 4.25 MB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 1,308 - Forks: 73

lucidrains/x-clip

A concise but complete implementation of CLIP with various experimental improvements from recent papers

Language: Python - Size: 1.46 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 710 - Forks: 47

KnowledgeDiscovery/rca_baselines

Code for "LEMMA-RCA: A Large Multi-modal Multi-domain Dataset for Root Cause Analysis" paper

Language: Python - Size: 1.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 20 - Forks: 5

filipbasara0/simple-clip

A minimal, but effective implementation of CLIP (Contrastive Language-Image Pretraining) in PyTorch

Language: Jupyter Notebook - Size: 83 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 31 - Forks: 5

vishalned/MMEarth-data

This repository contains code to download data for the preprint "MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning"

Language: Python - Size: 114 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 65 - Forks: 4

TianyiFranklinWang/MIRROR

MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention

Language: Python - Size: 8.12 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 0

fmenat/missingviews-study-EO

Public repository of our IGARSS 2024 work

Language: Python - Size: 455 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

fmenat/MultiviewCropClassification

Public repository of our IGARSS 2023 submission

Language: Python - Size: 132 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 16 - Forks: 1

fmenat/optimal-multiview-crop-classifier

Public repository of our work in the search for an optimal multi-view crop classifier (considering encoder architectures and fusion strategies)

Language: Python - Size: 9.35 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 0

depshad/Deep-Learning-Framework-for-Multi-modal-Product-Classification

Code repository for Rakuten Data Challenge: Multimodal Product Classification and Retrieval.

Language: Jupyter Notebook - Size: 174 KB - Last synced at: 26 days ago - Pushed at: about 4 years ago - Stars: 26 - Forks: 9

ttgeng233/UnAV

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

Language: Python - Size: 19.9 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 63 - Forks: 6

924973292/IDEA

【CVPR2025】IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification

Language: Python - Size: 34.9 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 17 - Forks: 3

zjukg/KG-MM-Survey

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

Size: 82.3 MB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 401 - Forks: 19

jokieleung/awesome-visual-question-answering

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

Size: 179 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 662 - Forks: 95

huggingface/chug

Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.

Language: Python - Size: 146 KB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 157 - Forks: 11

eieye/BLN_ABC

A Primer for basic literacy development | Ein "Vorkurs" zur Alphabetisierung in Deutsch als Zweitsprache

Language: JavaScript - Size: 1.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

qizekun/ReCon

[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

Language: Python - Size: 1.97 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 142 - Forks: 13

kyegomez/MegaVIT

The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"

Language: Python - Size: 211 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 28 - Forks: 1

ZINZINBIN/Disruption-Prediciton-based-on-Multimodal-Deep-Learning

Research-repository: Disruption Prediction and Analysis through Multimodal Deep Learning in KSTAR

Language: Jupyter Notebook - Size: 196 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

Ysz2022/NeRCo

[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement

Language: Python - Size: 1.87 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 231 - Forks: 16

zhjohnchan/awesome-vision-and-language-pretraining

A curated list of vision-and-language pre-training (VLP). :-)

Size: 125 KB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 58 - Forks: 7

ForYourEyesOnlyyy/Practical-Machine-Learning-Deep-Learning

A collection of Jupyter notebooks covering hands-on experiments in deep learning, NLP, computer vision, and time-series forecasting. Includes model training, fine-tuning, and tracking with tools like TensorBoard, ClearML, and HuggingFace

Language: Jupyter Notebook - Size: 10.4 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

OpenRobotLab/EmbodiedScan

[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Language: Python - Size: 23.3 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 551 - Forks: 40

rinnakk/japanese-clip

Japanese CLIP by rinna Co., Ltd.

Language: Python - Size: 574 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 72 - Forks: 9

STiFLeR7/Multi-Modal-Learning-for-Image-and-Text-Analysis

Develops approaches for jointly analyzing images and text using deep learning. Covers applications like image-text matching, visual question answering, image captioning, and sentiment analysis with visual context.

Language: Python - Size: 873 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

zhengli97/ATPrompt

Official PyTorch Code for "ATPrompt: Textual Prompt Learning with Embedded Attributes"

Language: Python - Size: 11.2 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 12 - Forks: 0

NoTody/HIST-AID

A multi-modal time-series dataset created from MIMIC.

Language: Python - Size: 125 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

ammarlodhi255/metadata-augmented-neural-networks-for-wild-animal-classification

This repository contains the implementation code for the paper "Metadata Augmented Neural Networks For Wild Animal Classification": https://www.sciencedirect.com/science/article/pii/S1574954124003479.

Language: Jupyter Notebook - Size: 8.83 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

lif314/NeAF

[AAAI 2025] Representing Sounds as Neural Amplitude Fields: A Benchmark of Coordinate-MLPs and A Fourier Kolmogorov-Arnold Framework

Language: Python - Size: 3.85 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

YunzeMan/Situation3D

[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning

Language: Python - Size: 63.3 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 28 - Forks: 2

richard-peng-xia/HGCLIP

[COLING'25] HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding

Language: Python - Size: 1.69 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 33 - Forks: 1

rentainhe/TRAR-VQA

[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"

Language: Python - Size: 927 KB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 66 - Forks: 18

amazon-science/contrastive_emc2

Code the ICML 2024 paper: "EMC^2: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence"

Language: Python - Size: 7.77 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

lyuchenyang/Efficient-VideoQA

Code for ACL SustaiNLP 2023 paper "Is a Video worth n × n Images? A Highly Efficient Approach to Transformer-based Video Question Answering"

Language: Python - Size: 28.3 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

lyuchenyang/Semantic-aware-VideoQA

Code for ACL SRW 2023 paepr "Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering"

Language: Python - Size: 31.3 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

ArsamAryandoust/UniversalGNNs

Universal graph neural networks for multi-task transfer learning

Language: Python - Size: 8.58 MB - Last synced at: 6 days ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 2

mailcorahul/auto_labeler

auto_labeler - An all-in-one library to automatically label vision data

Language: Python - Size: 32.2 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 9 - Forks: 1

924973292/EDITOR

【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Language: Python - Size: 10.6 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 77 - Forks: 5

deep-symbolic-mathematics/Multimodal-Symbolic-Regression

[ICLR 2024 Spotlight] SNIP on Symbolic Regression: Deep Symbolic Regression with Multimodal Pretraining

Language: Python - Size: 1.29 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 12 - Forks: 3

deep-symbolic-mathematics/Multimodal-Math-Pretraining

[ICLR 2024 Spotlight] This is the official code for the paper "SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training"

Language: Python - Size: 989 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 42 - Forks: 5

jianzhnie/MultimodalTransformers

lmmtoolkit is a toolkit for Multi-Modal Learning

Language: Python - Size: 22.5 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

loharmurtaza/FoG_detection_subject_dependent

This repository is based on my research work "Detecting Freezing of Gait in Parkinson's Disease Patients Using Multi-Modal Machine Learning"

Size: 626 KB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

LIU42/Contrastive

项目取材自 2024 年 ”泰迪杯“ 数据挖掘挑战赛 B 题,基于共享特征空间对比学习的跨模态图文互检模型

Language: Python - Size: 20.5 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

wjun0830/CGDETR

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

Language: Python - Size: 23.3 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 105 - Forks: 11

JHKim-snu/PGA

[IROS 2024] PGA: Personalizing Grasping Agents with Single Human-Robot Interaction

Language: Python - Size: 34.8 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 4 - Forks: 0

gaurav104/WSS-CMER

Code for the paper : "Weakly supervised segmentation with cross-modality equivariant constraints", available at https://arxiv.org/pdf/2104.02488.pdf

Language: Python - Size: 656 KB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 20 - Forks: 3

DFKI-Earth-And-Space-Applications/MVCC_IGARSS

Public repository of our IGARSS 2023 submission

Language: Python - Size: 132 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 1 - Forks: 1

GuanRunwei/Achelous

Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar

Language: Python - Size: 67.3 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 135 - Forks: 7

kyegomez/NeVA

The open source implementation of "NeVA: NeMo Vision and Language Assistant"

Language: Python - Size: 253 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 18 - Forks: 1

peixinlei/M2HSE

PyTorch code for the paper "Complementarity is the king: A multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval"

Language: Python - Size: 82 KB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

tudelft-iv/UniBEV

[IVS'24] UniBEV: the official implementation of UniBEV

Language: Python - Size: 12.3 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 10 - Forks: 1

sayakpaul/Multimodal-Entailment-Baseline

This repository shows how to implement a basic model for multimodal entailment.

Language: Jupyter Notebook - Size: 3.17 MB - Last synced at: 7 days ago - Pushed at: almost 4 years ago - Stars: 10 - Forks: 4

yihedeng9/STIC

Enhancing Large Vision Language Models with Self-Training on Image Comprehension.

Language: Python - Size: 5.98 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

outta-ai/2023_OUTTA_AIBootcamp_final_project

Language: Python - Size: 1.87 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

likyoo/Multimodal-Remote-Sensing-Toolkit

A python tool to perform deep learning experiments on multimodal remote sensing data.

Language: Python - Size: 1.01 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 74 - Forks: 12

RAIVNLab/sugar-crepe

[NeurIPS 2023] A faithful benchmark for vision-language compositionality

Language: Python - Size: 3.33 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 54 - Forks: 7

graphprojects/CM-GCL

Source code of NeurIPS 2022 paper “Co-Modality Graph Contrastive Learning for Imbalanced Node Classification”

Language: Python - Size: 3.58 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 16 - Forks: 2

sandipan211/ZSD-SC-Resolver

Resolving semantic confusions for improved zero-shot detection (BMVC 2022)

Language: Python - Size: 77 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 20 - Forks: 4

WangJingyao07/ST-F2M

🌈 Official Code for **Spatio-Temporal Fuzzy-oriented Multi-modal Meta-learning for Fine-grained Emotion Recognition**

Language: Python - Size: 12.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

machineHan/Paper-review-HMTL

multi-modal sentiment analysis method

Size: 83 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Gtothemoon/Contrastive-VisionVAE-Follower

Contrastive-VisionVAE-Follower is a model used for multi-modal task called Vision-and-Language Navigation (VLN).

Language: C++ - Size: 14.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

mattroz/miniCLIP

Implementation of CLIP model with a reduced capacity. For self-educational purposes only.

Language: Python - Size: 1.63 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

talipucar/DomainTranslation

Pytorch implementation of "Multi-domain translation between single-cell imaging and sequencing data using autoencoders" (https://www.nature.com/articles/s41467-020-20249-2) with custom models.

Language: Python - Size: 2.17 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

WillDreamer/Aurora

[NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model

Language: Python - Size: 118 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 48 - Forks: 3

MunzerDw/Gen3DQA

My paper (BMVC23) on 3D visual question answering at the lab of Prof. Dr. Niessner at Technical University of Munich.

Language: Python - Size: 16.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

abhrac/xmodal-vit

Official implementation of "Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval", BMVC 2022.

Language: Python - Size: 330 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 16 - Forks: 1

moabarar/nemar

[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

Language: Python - Size: 161 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 153 - Forks: 25

MMintLab/VIRDO

Github repository of a Visio-tactile Implicit Representations of Deformable Objects (ICRA 2022)

Language: Jupyter Notebook - Size: 733 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 1

LooperXX/ManagerTower

Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

Language: Python - Size: 6.71 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

peymanbateni/multimodal-emotion-analysis-in-conversations

Multi-model analysis of sentiment and emotion in multi-speaker conversations.

Language: Jupyter Notebook - Size: 36.2 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 24 - Forks: 6

josedolz/HyperDenseNet_pytorch

Pytorch version of the HyperDenseNet deep neural network for multi-modal image segmentation

Language: Python - Size: 396 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 74 - Forks: 12

MIFA-Lab/InstructionGPT-4 Fork of waltonfuture/InstructionGPT-4

About Implementation for paper "InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4" (https://arxiv.org/abs/2308.12067)

Size: 1.78 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

fpsluozi/tofindwaldo

Official Repo for "To Find Waldo You Need Contextual Cues: Debiasing Who’s Waldo", ACL 2022 (Short)

Size: 205 KB - Last synced at: 8 months ago - Pushed at: about 2 years ago - Stars: 6 - Forks: 0

liyichen-cly/MMEA

MMEA: Entity Alignment for Multi-Modal Knowledge Graphs

Language: Python - Size: 319 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 31 - Forks: 4

hyeonsieun/Text-to-Image_Generation

Language: Jupyter Notebook - Size: 7.49 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

GuanRunwei/VehicleFinder-CTIM

Language: Python - Size: 7.13 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

HackerHyper/ACMVH

Adaptive Confidence Multi-View Hashing

Language: Python - Size: 19.5 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 0

yookyungkho/Multimodal-Entailment-pytorch

Pytorch Implementation of Multimodal Entailment baseline

Language: Jupyter Notebook - Size: 801 KB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

Hleephilip/MLVU-project

Modality Translation through Conditional Encoder-Decoder (2023-1 Machine Learning for Visual Understanding Team project)

Language: Python - Size: 1.64 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

Bekyilma/VA_RecSys

Learning Latent Semantic Representations of Paintings for Personalized Recommendation

Language: PHP - Size: 10.7 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

joannahong/Visagesyntalk

The video demo of ECCV2022 paper titled "Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection"

Size: 44.5 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Related Keywords
multi-modal-learning 113 deep-learning 41 pytorch 21 computer-vision 17 multi-modal 15 machine-learning 12 multi-view-learning 7 vision-and-language 7 artificial-intelligence 7 vision-language-model 7 nlp 7 multi-modal-fusion 6 clip 6 representation-learning 6 contrastive-learning 6 transformers 6 language-model 5 remote-sensing 5 object-detection 5 multi-task-learning 5 self-supervised-learning 5 multimodal-learning 4 natural-language-processing 4 gpt4 4 deep-neural-networks 4 segmentation 4 vision-language 4 cvpr2024 3 cnn 3 bert 3 zero-shot-classification 3 tensorflow 3 text-to-image 3 pretrained-models 3 video-question-answering 3 vision-language-learning 3 transformer 3 python 3 cross-modal-retrieval 3 robotics 3 visual-question-answering 3 ai 3 image-classification 3 crop-classification 3 root-cause-analysis 2 dataset 2 causal-discovery 2 reid 2 image-text-retrieval 2 knowledge-graph 2 vision-and-language-pre-training 2 action-recognition 2 autonomous-driving 2 entity-alignment 2 vision-transformer 2 datasets 2 medical-image-processing 2 iccv 2 robustness 2 agriculture-research 2 earth-observation 2 sentiment-classification 2 crop-type-mapping 2 croptypes 2 zero-shot-learning 2 data-fusion 2 datafusion 2 vqa 2 multisensor-fusion 2 stackgan 2 pytorch-lightning 2 openai-clip 2 multimodal 2 lafite 2 multiview-learning 2 attngan 2 sensor-fusion 2 prompt-learning 2 knowledge-distillation 2 domain-experts 2 vehicle-reidentification 2 vision 2 openai 2 paper-list 2 optimization-algorithms 2 prompt-engineering 2 ai4math 2 ai4science 2 speech-recognition 2 prompt-tuning 2 transfer-learning 2 machine-learning-algorithms 2 rag 2 retrieval-augmented-generation 2 symbolic-regression 2 graph-neural-networks 2 multi-modal-imaging 2 instance-segmentation 2 3d-scene-understanding 2 contrastive-loss 2