GitHub topics: multi-modal-learning
XxabueloxX/Vision-Matters
Vision Matters explores how simple visual changes can enhance multimodal math reasoning. Join the discussion and contribute to the project! 👩💻👨💻
Language: Python - Size: 15.9 MB - Last synced at: 25 minutes ago - Pushed at: about 2 hours ago - Stars: 0 - Forks: 0

aneeuk/four-sided-triangle
A sophisticated multi-model optimization pipeline for domain-expert knowledge extraction RAG systems
Language: Python - Size: 6.57 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

fullscreen-triangle/four-sided-triangle
A sophisticated multi-model optimization pipeline for domain-expert knowledge extraction RAG systems
Language: Python - Size: 6.73 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 8 - Forks: 0

MosesTheRedSea/SOLID
Spatial Object Learning with Integrated Dimensions
Size: 15.6 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

chenxi52/CMPF
Open-Vocabulary Panoptic Segmentation
Language: Python - Size: 1.12 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 24 - Forks: 1

kyegomez/zeta
Build high-performance AI models with modular building blocks
Language: Python - Size: 41.3 MB - Last synced at: 9 days ago - Pushed at: 12 days ago - Stars: 525 - Forks: 52

924973292/Awesome-Multi-Modal-Object-Re-Identification
Welcome to the Awesome Multi-Modal Object Re-Identification Repository! This repository is dedicated to curating and sharing the latest methods, datasets, and resources focused specifically on the domain of multi-modal object re-identification. It brings together cutting-edge research, tools, and papers aimed at advancing the study and application.
Size: 38.1 KB - Last synced at: 4 days ago - Pushed at: 26 days ago - Stars: 61 - Forks: 5

zhengli97/PromptKD
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
Language: Python - Size: 11.2 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 315 - Forks: 6

UNITES-Lab/Flex-MoE
[NeurIPS 2024 Spotlight] Code for the paper "Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts"
Language: Python - Size: 3.14 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 52 - Forks: 2

Awais-Asghar/Skin-Cancer-Binary-Classifier
A machine learning project for binary classification of skin cancer as malignant or benign, utilizing models like XGBoost, LGBM Classifier, Adaboost, SVM, and Logistic Regression. Features comprehensive data preprocessing, model training, and evaluation for accurate diagnosis.
Language: Jupyter Notebook - Size: 5.65 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

EfiLygda/Multi-Modal-Deep-Learning-Model-for-Alzheimer-s-Disease-Classification
Multi Modal Deep Learning Model used for Alzheimer's Disease Classification.
Language: Python - Size: 839 KB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

3dlg-hcvc/DuoduoCLIP
[ICLR 2025] Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Language: Python - Size: 64.2 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 57 - Forks: 4

mlfoundations/open_clip
An open source implementation of CLIP.
Language: Python - Size: 15 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 11,830 - Forks: 1,110

lyuchenyang/Macaw-LLM
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Language: Python - Size: 35.9 MB - Last synced at: 24 days ago - Pushed at: 6 months ago - Stars: 1,568 - Forks: 124

lemma-rca/lemma-rca.github.io
Code for LEMMA-RCA website
Language: HTML - Size: 3.66 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 3 - Forks: 2

shikras/d-cube
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
Language: Python - Size: 835 KB - Last synced at: 25 days ago - Pushed at: over 1 year ago - Stars: 123 - Forks: 7

OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Language: Python - Size: 2.5 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 5,203 - Forks: 497

DmitryRyumin/CVPR-2023-24-Papers
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!
Language: Python - Size: 10.3 MB - Last synced at: 20 days ago - Pushed at: 11 months ago - Stars: 451 - Forks: 30

Agora-Lab-AI/EKR
Elysium Knowledge Repository is an open source initiative to embed all of Humanity's multi-modal knowledge and wisdom.
Language: Python - Size: 2.15 MB - Last synced at: 21 days ago - Pushed at: almost 2 years ago - Stars: 8 - Forks: 1

NVlabs/prismer
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Language: Python - Size: 4.25 MB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 1,308 - Forks: 73

lucidrains/x-clip
A concise but complete implementation of CLIP with various experimental improvements from recent papers
Language: Python - Size: 1.46 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 710 - Forks: 47

KnowledgeDiscovery/rca_baselines
Code for "LEMMA-RCA: A Large Multi-modal Multi-domain Dataset for Root Cause Analysis" paper
Language: Python - Size: 1.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 20 - Forks: 5

filipbasara0/simple-clip
A minimal, but effective implementation of CLIP (Contrastive Language-Image Pretraining) in PyTorch
Language: Jupyter Notebook - Size: 83 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 31 - Forks: 5

vishalned/MMEarth-data
This repository contains code to download data for the preprint "MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning"
Language: Python - Size: 114 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 65 - Forks: 4

TianyiFranklinWang/MIRROR
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention
Language: Python - Size: 8.12 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 0

fmenat/missingviews-study-EO
Public repository of our IGARSS 2024 work
Language: Python - Size: 455 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

fmenat/MultiviewCropClassification
Public repository of our IGARSS 2023 submission
Language: Python - Size: 132 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 16 - Forks: 1

fmenat/optimal-multiview-crop-classifier
Public repository of our work in the search for an optimal multi-view crop classifier (considering encoder architectures and fusion strategies)
Language: Python - Size: 9.35 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 0

depshad/Deep-Learning-Framework-for-Multi-modal-Product-Classification
Code repository for Rakuten Data Challenge: Multimodal Product Classification and Retrieval.
Language: Jupyter Notebook - Size: 174 KB - Last synced at: 26 days ago - Pushed at: about 4 years ago - Stars: 26 - Forks: 9

ttgeng233/UnAV
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
Language: Python - Size: 19.9 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 63 - Forks: 6

924973292/IDEA
【CVPR2025】IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
Language: Python - Size: 34.9 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 17 - Forks: 3

zjukg/KG-MM-Survey
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
Size: 82.3 MB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 401 - Forks: 19

jokieleung/awesome-visual-question-answering
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Size: 179 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 662 - Forks: 95

huggingface/chug
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
Language: Python - Size: 146 KB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 157 - Forks: 11

eieye/BLN_ABC
A Primer for basic literacy development | Ein "Vorkurs" zur Alphabetisierung in Deutsch als Zweitsprache
Language: JavaScript - Size: 1.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

qizekun/ReCon
[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
Language: Python - Size: 1.97 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 142 - Forks: 13

kyegomez/MegaVIT
The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"
Language: Python - Size: 211 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 28 - Forks: 1

ZINZINBIN/Disruption-Prediciton-based-on-Multimodal-Deep-Learning
Research-repository: Disruption Prediction and Analysis through Multimodal Deep Learning in KSTAR
Language: Jupyter Notebook - Size: 196 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

Ysz2022/NeRCo
[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement
Language: Python - Size: 1.87 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 231 - Forks: 16

zhjohnchan/awesome-vision-and-language-pretraining
A curated list of vision-and-language pre-training (VLP). :-)
Size: 125 KB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 58 - Forks: 7

ForYourEyesOnlyyy/Practical-Machine-Learning-Deep-Learning
A collection of Jupyter notebooks covering hands-on experiments in deep learning, NLP, computer vision, and time-series forecasting. Includes model training, fine-tuning, and tracking with tools like TensorBoard, ClearML, and HuggingFace
Language: Jupyter Notebook - Size: 10.4 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

OpenRobotLab/EmbodiedScan
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Language: Python - Size: 23.3 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 551 - Forks: 40

rinnakk/japanese-clip
Japanese CLIP by rinna Co., Ltd.
Language: Python - Size: 574 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 72 - Forks: 9

STiFLeR7/Multi-Modal-Learning-for-Image-and-Text-Analysis
Develops approaches for jointly analyzing images and text using deep learning. Covers applications like image-text matching, visual question answering, image captioning, and sentiment analysis with visual context.
Language: Python - Size: 873 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

zhengli97/ATPrompt
Official PyTorch Code for "ATPrompt: Textual Prompt Learning with Embedded Attributes"
Language: Python - Size: 11.2 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 12 - Forks: 0

NoTody/HIST-AID
A multi-modal time-series dataset created from MIMIC.
Language: Python - Size: 125 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

ammarlodhi255/metadata-augmented-neural-networks-for-wild-animal-classification
This repository contains the implementation code for the paper "Metadata Augmented Neural Networks For Wild Animal Classification": https://www.sciencedirect.com/science/article/pii/S1574954124003479.
Language: Jupyter Notebook - Size: 8.83 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

lif314/NeAF
[AAAI 2025] Representing Sounds as Neural Amplitude Fields: A Benchmark of Coordinate-MLPs and A Fourier Kolmogorov-Arnold Framework
Language: Python - Size: 3.85 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

YunzeMan/Situation3D
[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning
Language: Python - Size: 63.3 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 28 - Forks: 2

richard-peng-xia/HGCLIP
[COLING'25] HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding
Language: Python - Size: 1.69 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 33 - Forks: 1

rentainhe/TRAR-VQA
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
Language: Python - Size: 927 KB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 66 - Forks: 18

amazon-science/contrastive_emc2
Code the ICML 2024 paper: "EMC^2: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence"
Language: Python - Size: 7.77 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

lyuchenyang/Efficient-VideoQA
Code for ACL SustaiNLP 2023 paper "Is a Video worth n × n Images? A Highly Efficient Approach to Transformer-based Video Question Answering"
Language: Python - Size: 28.3 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

lyuchenyang/Semantic-aware-VideoQA
Code for ACL SRW 2023 paepr "Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering"
Language: Python - Size: 31.3 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

ArsamAryandoust/UniversalGNNs
Universal graph neural networks for multi-task transfer learning
Language: Python - Size: 8.58 MB - Last synced at: 6 days ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 2

mailcorahul/auto_labeler
auto_labeler - An all-in-one library to automatically label vision data
Language: Python - Size: 32.2 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 9 - Forks: 1

924973292/EDITOR
【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Language: Python - Size: 10.6 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 77 - Forks: 5

deep-symbolic-mathematics/Multimodal-Symbolic-Regression
[ICLR 2024 Spotlight] SNIP on Symbolic Regression: Deep Symbolic Regression with Multimodal Pretraining
Language: Python - Size: 1.29 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 12 - Forks: 3

deep-symbolic-mathematics/Multimodal-Math-Pretraining
[ICLR 2024 Spotlight] This is the official code for the paper "SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training"
Language: Python - Size: 989 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 42 - Forks: 5

jianzhnie/MultimodalTransformers
lmmtoolkit is a toolkit for Multi-Modal Learning
Language: Python - Size: 22.5 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

loharmurtaza/FoG_detection_subject_dependent
This repository is based on my research work "Detecting Freezing of Gait in Parkinson's Disease Patients Using Multi-Modal Machine Learning"
Size: 626 KB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

LIU42/Contrastive
项目取材自 2024 年 ”泰迪杯“ 数据挖掘挑战赛 B 题,基于共享特征空间对比学习的跨模态图文互检模型
Language: Python - Size: 20.5 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

wjun0830/CGDETR
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
Language: Python - Size: 23.3 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 105 - Forks: 11

JHKim-snu/PGA
[IROS 2024] PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
Language: Python - Size: 34.8 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 4 - Forks: 0

gaurav104/WSS-CMER
Code for the paper : "Weakly supervised segmentation with cross-modality equivariant constraints", available at https://arxiv.org/pdf/2104.02488.pdf
Language: Python - Size: 656 KB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 20 - Forks: 3

DFKI-Earth-And-Space-Applications/MVCC_IGARSS
Public repository of our IGARSS 2023 submission
Language: Python - Size: 132 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 1 - Forks: 1

GuanRunwei/Achelous
Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar
Language: Python - Size: 67.3 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 135 - Forks: 7

kyegomez/NeVA
The open source implementation of "NeVA: NeMo Vision and Language Assistant"
Language: Python - Size: 253 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 18 - Forks: 1

peixinlei/M2HSE
PyTorch code for the paper "Complementarity is the king: A multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval"
Language: Python - Size: 82 KB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

tudelft-iv/UniBEV
[IVS'24] UniBEV: the official implementation of UniBEV
Language: Python - Size: 12.3 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 10 - Forks: 1

sayakpaul/Multimodal-Entailment-Baseline
This repository shows how to implement a basic model for multimodal entailment.
Language: Jupyter Notebook - Size: 3.17 MB - Last synced at: 7 days ago - Pushed at: almost 4 years ago - Stars: 10 - Forks: 4

yihedeng9/STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
Language: Python - Size: 5.98 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

outta-ai/2023_OUTTA_AIBootcamp_final_project
Language: Python - Size: 1.87 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

likyoo/Multimodal-Remote-Sensing-Toolkit
A python tool to perform deep learning experiments on multimodal remote sensing data.
Language: Python - Size: 1.01 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 74 - Forks: 12

RAIVNLab/sugar-crepe
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
Language: Python - Size: 3.33 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 54 - Forks: 7

graphprojects/CM-GCL
Source code of NeurIPS 2022 paper “Co-Modality Graph Contrastive Learning for Imbalanced Node Classification”
Language: Python - Size: 3.58 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 16 - Forks: 2

sandipan211/ZSD-SC-Resolver
Resolving semantic confusions for improved zero-shot detection (BMVC 2022)
Language: Python - Size: 77 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 20 - Forks: 4

WangJingyao07/ST-F2M
🌈 Official Code for **Spatio-Temporal Fuzzy-oriented Multi-modal Meta-learning for Fine-grained Emotion Recognition**
Language: Python - Size: 12.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

machineHan/Paper-review-HMTL
multi-modal sentiment analysis method
Size: 83 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Gtothemoon/Contrastive-VisionVAE-Follower
Contrastive-VisionVAE-Follower is a model used for multi-modal task called Vision-and-Language Navigation (VLN).
Language: C++ - Size: 14.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

mattroz/miniCLIP
Implementation of CLIP model with a reduced capacity. For self-educational purposes only.
Language: Python - Size: 1.63 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

talipucar/DomainTranslation
Pytorch implementation of "Multi-domain translation between single-cell imaging and sequencing data using autoencoders" (https://www.nature.com/articles/s41467-020-20249-2) with custom models.
Language: Python - Size: 2.17 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

WillDreamer/Aurora
[NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model
Language: Python - Size: 118 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 48 - Forks: 3

MunzerDw/Gen3DQA
My paper (BMVC23) on 3D visual question answering at the lab of Prof. Dr. Niessner at Technical University of Munich.
Language: Python - Size: 16.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

abhrac/xmodal-vit
Official implementation of "Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval", BMVC 2022.
Language: Python - Size: 330 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 16 - Forks: 1

moabarar/nemar
[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation
Language: Python - Size: 161 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 153 - Forks: 25

MMintLab/VIRDO
Github repository of a Visio-tactile Implicit Representations of Deformable Objects (ICRA 2022)
Language: Jupyter Notebook - Size: 733 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 1

LooperXX/ManagerTower
Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Language: Python - Size: 6.71 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

peymanbateni/multimodal-emotion-analysis-in-conversations
Multi-model analysis of sentiment and emotion in multi-speaker conversations.
Language: Jupyter Notebook - Size: 36.2 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 24 - Forks: 6

josedolz/HyperDenseNet_pytorch
Pytorch version of the HyperDenseNet deep neural network for multi-modal image segmentation
Language: Python - Size: 396 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 74 - Forks: 12

MIFA-Lab/InstructionGPT-4 Fork of waltonfuture/InstructionGPT-4
About Implementation for paper "InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4" (https://arxiv.org/abs/2308.12067)
Size: 1.78 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

fpsluozi/tofindwaldo
Official Repo for "To Find Waldo You Need Contextual Cues: Debiasing Who’s Waldo", ACL 2022 (Short)
Size: 205 KB - Last synced at: 8 months ago - Pushed at: about 2 years ago - Stars: 6 - Forks: 0

liyichen-cly/MMEA
MMEA: Entity Alignment for Multi-Modal Knowledge Graphs
Language: Python - Size: 319 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 31 - Forks: 4

hyeonsieun/Text-to-Image_Generation
Language: Jupyter Notebook - Size: 7.49 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

GuanRunwei/VehicleFinder-CTIM
Language: Python - Size: 7.13 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

HackerHyper/ACMVH
Adaptive Confidence Multi-View Hashing
Language: Python - Size: 19.5 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 0

yookyungkho/Multimodal-Entailment-pytorch
Pytorch Implementation of Multimodal Entailment baseline
Language: Jupyter Notebook - Size: 801 KB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

Hleephilip/MLVU-project
Modality Translation through Conditional Encoder-Decoder (2023-1 Machine Learning for Visual Understanding Team project)
Language: Python - Size: 1.64 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

Bekyilma/VA_RecSys
Learning Latent Semantic Representations of Paintings for Personalized Recommendation
Language: PHP - Size: 10.7 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

joannahong/Visagesyntalk
The video demo of ECCV2022 paper titled "Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection"
Size: 44.5 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0
