Topic: "multimodal-deep-learning"
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language: Jupyter Notebook - Size: 79.3 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 10,478 - Forks: 1,021

Yutong-Zhou-cv/Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
Size: 69.2 MB - Last synced at: about 15 hours ago - Pushed at: about 16 hours ago - Stars: 2,325 - Forks: 200

KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
Language: Python - Size: 1.06 MB - Last synced at: 12 days ago - Pushed at: 6 months ago - Stars: 1,928 - Forks: 333

AI4Finance-Foundation/FinRobot
FinRobot: An Open-Source AI Agent Platform for Financial Analysis using LLMs 🚀 🚀 🚀
Language: Jupyter Notebook - Size: 7.4 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 1,858 - Forks: 286

kyegomez/BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
Language: Python - Size: 36.5 MB - Last synced at: 10 days ago - Pushed at: 20 days ago - Stars: 1,772 - Forks: 161

AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Language: C++ - Size: 104 MB - Last synced at: 13 days ago - Pushed at: 15 days ago - Stars: 1,685 - Forks: 191

jrzaurin/pytorch-widedeep
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
Language: Python - Size: 99.6 MB - Last synced at: 2 days ago - Pushed at: about 2 months ago - Stars: 1,340 - Forks: 195

DWCTOD/CVPR2024-Papers-with-Code-Demo
收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!
Size: 137 KB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 1,335 - Forks: 150

yuewang-cuhk/awesome-vision-language-pretraining-papers
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
Size: 104 KB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 1,152 - Forks: 104

TheShadow29/awesome-grounding
awesome grounding: A curated list of research papers in visual grounding
Size: 172 KB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 1,069 - Forks: 100

declare-lab/multimodal-deep-learning
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
Language: OpenEdge ABL - Size: 181 MB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 801 - Forks: 157

richard-peng-xia/awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
Size: 234 KB - Last synced at: 1 day ago - Pushed at: 22 days ago - Stars: 722 - Forks: 67

omriav/blended-latent-diffusion
Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]
Language: Jupyter Notebook - Size: 9.84 MB - Last synced at: 27 days ago - Pushed at: 11 months ago - Stars: 594 - Forks: 37

MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Language: Python - Size: 186 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 413 - Forks: 34

jianghaojun/Awesome-Parameter-Efficient-Transfer-Learning
A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
Size: 165 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 372 - Forks: 23

westlake-repl/Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review
Paper List of Pre-trained Foundation Recommender Models
Size: 444 KB - Last synced at: 3 days ago - Pushed at: 8 months ago - Stars: 349 - Forks: 27

theislab/scarches
Reference mapping for single-cell genomics
Language: Jupyter Notebook - Size: 825 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 347 - Forks: 52

remyxai/VQASynth
Compose multimodal datasets 🎹
Language: Python - Size: 16.3 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 344 - Forks: 14

fcakyon/content-moderation-deep-learning
Deep learning based content moderation from text, audio, video & image input modalities.
Size: 188 KB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 343 - Forks: 20

soujanyaporia/MUStARD
Multimodal Sarcasm Detection Dataset
Language: OpenEdge ABL - Size: 75.4 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 335 - Forks: 62

Yutong-Zhou-cv/Awesome-Multimodality
A Survey on multimodal learning research.
Size: 1.76 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 324 - Forks: 22

sail-sg/CLoT
CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".
Language: Python - Size: 6.46 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 310 - Forks: 16

ilaria-manco/multimodal-ml-music
List of academic resources on Multimodal ML for Music
Language: TeX - Size: 268 KB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 293 - Forks: 11

phellonchen/awesome-Vision-and-Language-Pre-training
Recent Advances in Vision and Language Pre-training (VLP)
Size: 81.1 KB - Last synced at: 1 day ago - Pushed at: almost 2 years ago - Stars: 292 - Forks: 16

DWCTOD/ECCV2022-Papers-with-Code-Demo
收集 ECCV 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!
Size: 170 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 286 - Forks: 23

declare-lab/awesome-emotion-recognition-in-conversations
A comprehensive reading list for Emotion Recognition in Conversations
Size: 273 KB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 269 - Forks: 45

MILVLG/prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
Language: Python - Size: 1.09 MB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 259 - Forks: 27

david-yoon/multimodal-speech-emotion
TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18
Language: Jupyter Notebook - Size: 238 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 239 - Forks: 70

drprojects/DeepViewAgg
[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
Language: Python - Size: 302 MB - Last synced at: 18 days ago - Pushed at: 9 months ago - Stars: 228 - Forks: 25

kyegomez/NaViT
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
Language: Python - Size: 210 KB - Last synced at: 15 days ago - Pushed at: 20 days ago - Stars: 226 - Forks: 10

geoaigroup/awesome-vision-language-models-for-earth-observation
A curated list of awesome vision and language resources for earth observation.
Size: 470 KB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 219 - Forks: 17

yuanze-lin/Learnable_Regions
[CVPR 2024] Official code for "Text-Driven Image Editing via Learnable Regions"
Language: Python - Size: 11.5 MB - Last synced at: 20 days ago - Pushed at: 7 months ago - Stars: 219 - Forks: 21

kyegomez/Med-PaLM
Towards Generalist Biomedical AI
Language: Python - Size: 850 KB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 219 - Forks: 35

mahmoodlab/MCAT
Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021
Language: Jupyter Notebook - Size: 540 MB - Last synced at: 15 days ago - Pushed at: about 3 years ago - Stars: 195 - Forks: 39

declare-lab/Multimodal-Infomax
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.
Language: Python - Size: 145 KB - Last synced at: 16 days ago - Pushed at: about 2 years ago - Stars: 179 - Forks: 34

friedrichor/Awesome-Multimodal-Papers
A curated list of awesome Multimodal studies.
Language: HTML - Size: 63.2 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 172 - Forks: 16

thuiar/MMSA-FET
A Tool for extracting multimodal features from videos.
Language: Python - Size: 24.4 MB - Last synced at: 12 days ago - Pushed at: about 2 years ago - Stars: 165 - Forks: 22

vijayvee/video-captioning
This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.
Language: Python - Size: 3.39 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 162 - Forks: 65

DavidHuji/CapDec
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
Language: Python - Size: 35.7 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 158 - Forks: 17

steve-zeyu-zhang/MotionAnything
🔥 Motion Anything: Any to Motion Generation
Size: 183 MB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 156 - Forks: 2

YuanGongND/cav-mae
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
Language: Python - Size: 12.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 151 - Forks: 13

LeapLabTHU/Pseudo-Q
[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Language: Python - Size: 22.9 MB - Last synced at: 20 days ago - Pushed at: 9 months ago - Stars: 148 - Forks: 10

florencejt/fusilli
A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸
Language: Python - Size: 987 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 146 - Forks: 12

kyegomez/PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
Language: Python - Size: 2.61 MB - Last synced at: 11 days ago - Pushed at: 20 days ago - Stars: 145 - Forks: 4

kyegomez/the-compiler
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
Language: Python - Size: 10.4 MB - Last synced at: 14 days ago - Pushed at: over 1 year ago - Stars: 145 - Forks: 17

westlake-repl/IDvs.MoRec
End-to-end Training for Multimodal Recommendation Systems
Language: Python - Size: 57.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 139 - Forks: 18

cap-ntu/Video-to-Retail-Platform
An intelligent multimodal-learning based system for video, product and ads analysis. Based on the system, people can build a lot of downstream applications such as product recommendation, video retrieval, etc.
Language: Python - Size: 65.7 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 138 - Forks: 43

AnkurDeria/MFT
Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.
Language: Jupyter Notebook - Size: 2.13 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 130 - Forks: 8

om-ai-lab/VL-CheckList
Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]
Language: Python - Size: 26.6 MB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 129 - Forks: 5

zhu-xlab/DOFA
Code for Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities
Language: Jupyter Notebook - Size: 993 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 126 - Forks: 11

IDEA-Research/ChatRex
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Language: Python - Size: 8.8 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 124 - Forks: 3

kyegomez/swarms-pytorch
Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊
Language: Python - Size: 58.2 MB - Last synced at: 20 days ago - Pushed at: 3 months ago - Stars: 121 - Forks: 10

shamanez/Self-Supervised-Embedding-Fusion-Transformer
The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.
Language: Python - Size: 4.65 MB - Last synced at: 4 days ago - Pushed at: over 3 years ago - Stars: 119 - Forks: 22

cambridgeltl/visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
Language: Python - Size: 10.7 MB - Last synced at: 20 days ago - Pushed at: about 2 years ago - Stars: 118 - Forks: 9

haamoon/mmtm
Implementation of CVPR 2020 paper "MMTM: Multimodal Transfer Module for CNN Fusion"
Language: Python - Size: 47.9 KB - Last synced at: 4 months ago - Pushed at: almost 5 years ago - Stars: 112 - Forks: 21

yuanze-lin/REVIVE
[NeurIPS 2022] Official code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Language: Python - Size: 3.39 MB - Last synced at: 14 days ago - Pushed at: 17 days ago - Stars: 100 - Forks: 10

DirtyHarryLYL/DJ-RN
As a part of HAKE project (HAKE-3D). Code for our CVPR2020 paper "Detailed 2D-3D Joint Representation for Human-Object Interaction".
Language: Python - Size: 5.3 MB - Last synced at: 23 days ago - Pushed at: over 2 years ago - Stars: 100 - Forks: 14

jianghaojun/Awesome-3D-Vision-and-Language
A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.
Size: 33.2 KB - Last synced at: 11 days ago - Pushed at: about 2 years ago - Stars: 97 - Forks: 5

kyegomez/PALI
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
Language: Python - Size: 624 KB - Last synced at: 21 days ago - Pushed at: about 1 year ago - Stars: 89 - Forks: 8

JerryX1110/awesome-rvos
Referring Video Object Segmentation / Multi-Object Tracking Repo
Language: Python - Size: 79.1 KB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 87 - Forks: 4

ch3cook-fdu/Vote2Cap-DETR
[CVPR 2023] Vote2Cap-DETR and [T-PAMI 2024] Vote2Cap-DETR++; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D Dense Captioning methods
Language: Python - Size: 308 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 82 - Forks: 5

Vision-CAIR/3DCoMPaT-v2
3DCoMPaT++: An improved large-scale 3D vision dataset for compositional recognition
Language: Python - Size: 133 MB - Last synced at: 8 days ago - Pushed at: 10 months ago - Stars: 82 - Forks: 6

referit3d/referit3d
Code accompanying our ECCV-2020 paper on 3D Neural Listeners.
Language: C++ - Size: 15.8 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 81 - Forks: 13

declare-lab/LLM-PuzzleTest
This repository is maintained to release dataset and models for multimodal puzzle reasoning.
Language: Python - Size: 131 MB - Last synced at: 16 days ago - Pushed at: about 2 months ago - Stars: 78 - Forks: 7

idearibosome/embracenet
Robust multimodal integration method implemented in PyTorch and TensorFlow
Language: Python - Size: 107 KB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 78 - Forks: 25

ilaria-manco/muscaps
Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)
Language: Jupyter Notebook - Size: 91.9 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 77 - Forks: 7

willxxy/awesome-mmps
Corpus of resources for multimodal machine learning with physiological signals (mmps).
Size: 1.03 MB - Last synced at: 1 day ago - Pushed at: 19 days ago - Stars: 73 - Forks: 2

kyegomez/Kosmos2.5
My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"
Language: Python - Size: 231 KB - Last synced at: 18 days ago - Pushed at: 20 days ago - Stars: 73 - Forks: 6

declare-lab/BBFN
This repository contains the implementation of the paper -- Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis
Language: Python - Size: 1.25 MB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 71 - Forks: 14

akashe/Multimodal-action-recognition
Code on selecting an action based on multimodal inputs. Here in this case inputs are voice and text.
Language: Python - Size: 64.7 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 69 - Forks: 11

aehrc/cvt2distilgpt2
Improving Chest X-Ray Report Generation by Leveraging Warm-Starting
Language: Python - Size: 93.5 MB - Last synced at: 12 days ago - Pushed at: 11 months ago - Stars: 67 - Forks: 7

vvvb-github/AVSegFormer
[AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer
Language: Python - Size: 486 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 62 - Forks: 5

rizavelioglu/hateful_memes-hate_detectron
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxiv.org/abs/2012.12975
Language: Jupyter Notebook - Size: 11.2 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 59 - Forks: 18

anita-hu/MSAF
Offical implementation of paper "MSAF: Multimodal Split Attention Fusion"
Language: Python - Size: 101 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 59 - Forks: 9

imatge-upc/wav2pix Fork of miqueltubau/Wav2Pix
Speech-conditioned face generation using Generative Adversarial Networks (ICASSP 2019)
Language: Python - Size: 202 MB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 55 - Forks: 24

Sreyan88/MMER
Code for the InterSpeech 2023 paper: MMER: Multimodal Multi-task learning for Speech Emotion Recognition
Language: Python - Size: 1.59 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 54 - Forks: 14

ForestsKing/Awesome-Multimodal-Time-Series
A curated list of paper, code, data, and other resources focus on multimodal time series analysis.
Size: 7.81 KB - Last synced at: 12 days ago - Pushed at: 24 days ago - Stars: 53 - Forks: 3

sutdcv/SUTD-TrafficQA
[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
Language: JavaScript - Size: 6 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 53 - Forks: 2

thuiar/MIntRec
MIntRec: A New Dataset for Multimodal Intent Recognition (ACM MM 2022)
Language: Python - Size: 1.49 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 53 - Forks: 8

marslanm/Multimodality-Representation-Learning
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
Size: 63.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 51 - Forks: 7

ai4ce/MARS
[CVPR2024] Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset
Language: Python - Size: 370 MB - Last synced at: 12 days ago - Pushed at: 10 months ago - Stars: 50 - Forks: 1

firojalam/multimodal_social_media
multimodal social media content (text, image) classification
Language: Python - Size: 3.54 MB - Last synced at: 22 days ago - Pushed at: almost 3 years ago - Stars: 50 - Forks: 14

HLTCHKUST/VG-GPLMs
The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".
Language: Python - Size: 9.32 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 49 - Forks: 8

naver/artemis
Official code release for ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (published at ICLR 2022)
Language: Python - Size: 1.26 MB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 48 - Forks: 4

lmb-freiburg/Multimodal-Future-Prediction
The official repository for the CVPR 2019 paper "Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction"
Language: Python - Size: 21.6 MB - Last synced at: 12 months ago - Pushed at: over 4 years ago - Stars: 47 - Forks: 8

ManifoldRG/NEKO
In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks
Language: Python - Size: 515 KB - Last synced at: 6 months ago - Pushed at: 10 months ago - Stars: 46 - Forks: 10

aimotive/aimotive_dataset
aiMotive public dataset
Size: 23.4 KB - Last synced at: 9 months ago - Pushed at: over 1 year ago - Stars: 44 - Forks: 2

penghu-cs/MRL
Learning Cross-Modal Retrieval with Noisy Labels (CVPR 2021, PyTorch Code)
Language: Python - Size: 23.9 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 44 - Forks: 10

YeonwooSung/LIMoE-pytorch
PyTorch implementation of LIMoE
Language: Python - Size: 4.3 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 43 - Forks: 1

affjljoo3581/Job-Recommend-Competition
🥇KNOW기반 직업 추천 알고리즘 경진대회 1등 솔루션입니다🥇
Language: Python - Size: 1.74 MB - Last synced at: 30 days ago - Pushed at: about 3 years ago - Stars: 43 - Forks: 4

husseinmozannar/multimodal-deep-learning-for-disaster-response
Damage Identification in Social Media Posts using Multimodal Deep Learning: code and dataset
Language: Python - Size: 62.5 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 43 - Forks: 16

sk-aravind/3D-Bounding-Boxes-From-Monocular-Images
A two stage multi-modal loss model along with rigid body transformations to regress 3D bounding boxes
Language: Python - Size: 9.46 MB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 43 - Forks: 18

A2Zadeh/Social-IQ
[CVPR 2019 Oral] Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence
Language: Python - Size: 2.71 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 41 - Forks: 5

Yuco-Z/Awesome-Multi-Modal-Dialog
[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics
Size: 169 KB - Last synced at: about 1 hour ago - Pushed at: 3 months ago - Stars: 39 - Forks: 4

RaptorMai/MLLM-CompBench
[NeurIPS'25] MLLM-CompBench evaluates the comparative reasoning of MLLMs with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, state, emotion, temporality, spatiality, quantity, and quality. CompBench covers diverse visual domains, including animals, fashion, sports, and scenes
Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 38 - Forks: 2

42jaylonw/shifu
Lightweight Isaac Gym Environment Builder
Language: Python - Size: 31.6 MB - Last synced at: 21 days ago - Pushed at: over 2 years ago - Stars: 37 - Forks: 2

choyingw/CFCNet
NeurIPS 2019: Deep RGB-D Canonical Correlation Analysis For Sparse Depth Completion
Language: Python - Size: 31.8 MB - Last synced at: 14 days ago - Pushed at: about 4 years ago - Stars: 37 - Forks: 4

yanganYNU/AFFGCN
Attention Feature Fusion base on spatial-temporal Graph Convolutional Network(AFFGCN)
Language: Python - Size: 144 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 36 - Forks: 1

VisualWebBench/VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
Language: Python - Size: 3.17 MB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 36 - Forks: 1

soloist97/densecap-pytorch
A simplified pytorch version of densecap
Language: Jupyter Notebook - Size: 5.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 34 - Forks: 8
