An open API service providing repository metadata for many open source software ecosystems.

Topic: "multimodal-deep-learning"

salesforce/LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language: Jupyter Notebook - Size: 79.3 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 10,478 - Forks: 1,021

Yutong-Zhou-cv/Awesome-Text-to-Image

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

Size: 69.2 MB - Last synced at: about 15 hours ago - Pushed at: about 16 hours ago - Stars: 2,325 - Forks: 200

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"

Language: Python - Size: 1.06 MB - Last synced at: 12 days ago - Pushed at: 6 months ago - Stars: 1,928 - Forks: 333

AI4Finance-Foundation/FinRobot

FinRobot: An Open-Source AI Agent Platform for Financial Analysis using LLMs 🚀 🚀 🚀

Language: Jupyter Notebook - Size: 7.4 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 1,858 - Forks: 286

kyegomez/BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Language: Python - Size: 36.5 MB - Last synced at: 10 days ago - Pushed at: 20 days ago - Stars: 1,772 - Forks: 161

AlibabaResearch/AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Language: C++ - Size: 104 MB - Last synced at: 13 days ago - Pushed at: 15 days ago - Stars: 1,685 - Forks: 191

jrzaurin/pytorch-widedeep

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

Language: Python - Size: 99.6 MB - Last synced at: 2 days ago - Pushed at: about 2 months ago - Stars: 1,340 - Forks: 195

DWCTOD/CVPR2024-Papers-with-Code-Demo

收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!

Size: 137 KB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 1,335 - Forks: 150

yuewang-cuhk/awesome-vision-language-pretraining-papers

Recent Advances in Vision and Language PreTrained Models (VL-PTMs)

Size: 104 KB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 1,152 - Forks: 104

TheShadow29/awesome-grounding

awesome grounding: A curated list of research papers in visual grounding

Size: 172 KB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 1,069 - Forks: 100

declare-lab/multimodal-deep-learning

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Language: OpenEdge ABL - Size: 181 MB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 801 - Forks: 157

richard-peng-xia/awesome-multimodal-in-medical-imaging

A collection of resources on applications of multi-modal learning in medical imaging.

Size: 234 KB - Last synced at: 1 day ago - Pushed at: 22 days ago - Stars: 722 - Forks: 67

omriav/blended-latent-diffusion

Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]

Language: Jupyter Notebook - Size: 9.84 MB - Last synced at: 27 days ago - Pushed at: 11 months ago - Stars: 594 - Forks: 37

MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Language: Python - Size: 186 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 413 - Forks: 34

jianghaojun/Awesome-Parameter-Efficient-Transfer-Learning

A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.

Size: 165 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 372 - Forks: 23

westlake-repl/Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review

Paper List of Pre-trained Foundation Recommender Models

Size: 444 KB - Last synced at: 3 days ago - Pushed at: 8 months ago - Stars: 349 - Forks: 27

theislab/scarches

Reference mapping for single-cell genomics

Language: Jupyter Notebook - Size: 825 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 347 - Forks: 52

remyxai/VQASynth

Compose multimodal datasets 🎹

Language: Python - Size: 16.3 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 344 - Forks: 14

fcakyon/content-moderation-deep-learning

Deep learning based content moderation from text, audio, video & image input modalities.

Size: 188 KB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 343 - Forks: 20

soujanyaporia/MUStARD

Multimodal Sarcasm Detection Dataset

Language: OpenEdge ABL - Size: 75.4 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 335 - Forks: 62

Yutong-Zhou-cv/Awesome-Multimodality

A Survey on multimodal learning research.

Size: 1.76 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 324 - Forks: 22

sail-sg/CLoT

CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".

Language: Python - Size: 6.46 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 310 - Forks: 16

ilaria-manco/multimodal-ml-music

List of academic resources on Multimodal ML for Music

Language: TeX - Size: 268 KB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 293 - Forks: 11

phellonchen/awesome-Vision-and-Language-Pre-training

Recent Advances in Vision and Language Pre-training (VLP)

Size: 81.1 KB - Last synced at: 1 day ago - Pushed at: almost 2 years ago - Stars: 292 - Forks: 16

DWCTOD/ECCV2022-Papers-with-Code-Demo

收集 ECCV 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!

Size: 170 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 286 - Forks: 23

declare-lab/awesome-emotion-recognition-in-conversations

A comprehensive reading list for Emotion Recognition in Conversations

Size: 273 KB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 269 - Forks: 45

MILVLG/prophet

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

Language: Python - Size: 1.09 MB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 259 - Forks: 27

david-yoon/multimodal-speech-emotion

TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18

Language: Jupyter Notebook - Size: 238 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 239 - Forks: 70

drprojects/DeepViewAgg

[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"

Language: Python - Size: 302 MB - Last synced at: 18 days ago - Pushed at: 9 months ago - Stars: 228 - Forks: 25

kyegomez/NaViT

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

Language: Python - Size: 210 KB - Last synced at: 15 days ago - Pushed at: 20 days ago - Stars: 226 - Forks: 10

geoaigroup/awesome-vision-language-models-for-earth-observation

A curated list of awesome vision and language resources for earth observation.

Size: 470 KB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 219 - Forks: 17

yuanze-lin/Learnable_Regions

[CVPR 2024] Official code for "Text-Driven Image Editing via Learnable Regions"

Language: Python - Size: 11.5 MB - Last synced at: 20 days ago - Pushed at: 7 months ago - Stars: 219 - Forks: 21

kyegomez/Med-PaLM

Towards Generalist Biomedical AI

Language: Python - Size: 850 KB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 219 - Forks: 35

mahmoodlab/MCAT

Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021

Language: Jupyter Notebook - Size: 540 MB - Last synced at: 15 days ago - Pushed at: about 3 years ago - Stars: 195 - Forks: 39

declare-lab/Multimodal-Infomax

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

Language: Python - Size: 145 KB - Last synced at: 16 days ago - Pushed at: about 2 years ago - Stars: 179 - Forks: 34

friedrichor/Awesome-Multimodal-Papers

A curated list of awesome Multimodal studies.

Language: HTML - Size: 63.2 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 172 - Forks: 16

thuiar/MMSA-FET

A Tool for extracting multimodal features from videos.

Language: Python - Size: 24.4 MB - Last synced at: 12 days ago - Pushed at: about 2 years ago - Stars: 165 - Forks: 22

vijayvee/video-captioning

This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.

Language: Python - Size: 3.39 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 162 - Forks: 65

DavidHuji/CapDec

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

Language: Python - Size: 35.7 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 158 - Forks: 17

steve-zeyu-zhang/MotionAnything

🔥 Motion Anything: Any to Motion Generation

Size: 183 MB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 156 - Forks: 2

YuanGongND/cav-mae

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".

Language: Python - Size: 12.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 151 - Forks: 13

LeapLabTHU/Pseudo-Q

[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

Language: Python - Size: 22.9 MB - Last synced at: 20 days ago - Pushed at: 9 months ago - Stars: 148 - Forks: 10

florencejt/fusilli

A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸

Language: Python - Size: 987 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 146 - Forks: 12

kyegomez/PALI3

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

Language: Python - Size: 2.61 MB - Last synced at: 11 days ago - Pushed at: 20 days ago - Stars: 145 - Forks: 4

kyegomez/the-compiler

Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!

Language: Python - Size: 10.4 MB - Last synced at: 14 days ago - Pushed at: over 1 year ago - Stars: 145 - Forks: 17

westlake-repl/IDvs.MoRec

End-to-end Training for Multimodal Recommendation Systems

Language: Python - Size: 57.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 139 - Forks: 18

cap-ntu/Video-to-Retail-Platform

An intelligent multimodal-learning based system for video, product and ads analysis. Based on the system, people can build a lot of downstream applications such as product recommendation, video retrieval, etc.

Language: Python - Size: 65.7 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 138 - Forks: 43

AnkurDeria/MFT

Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.

Language: Jupyter Notebook - Size: 2.13 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 130 - Forks: 8

om-ai-lab/VL-CheckList

Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]

Language: Python - Size: 26.6 MB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 129 - Forks: 5

zhu-xlab/DOFA

Code for Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities

Language: Jupyter Notebook - Size: 993 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 126 - Forks: 11

IDEA-Research/ChatRex

Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

Language: Python - Size: 8.8 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 124 - Forks: 3

kyegomez/swarms-pytorch

Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊

Language: Python - Size: 58.2 MB - Last synced at: 20 days ago - Pushed at: 3 months ago - Stars: 121 - Forks: 10

shamanez/Self-Supervised-Embedding-Fusion-Transformer

The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.

Language: Python - Size: 4.65 MB - Last synced at: 4 days ago - Pushed at: over 3 years ago - Stars: 119 - Forks: 22

cambridgeltl/visual-spatial-reasoning

[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.

Language: Python - Size: 10.7 MB - Last synced at: 20 days ago - Pushed at: about 2 years ago - Stars: 118 - Forks: 9

haamoon/mmtm

Implementation of CVPR 2020 paper "MMTM: Multimodal Transfer Module for CNN Fusion"

Language: Python - Size: 47.9 KB - Last synced at: 4 months ago - Pushed at: almost 5 years ago - Stars: 112 - Forks: 21

yuanze-lin/REVIVE

[NeurIPS 2022] Official code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

Language: Python - Size: 3.39 MB - Last synced at: 14 days ago - Pushed at: 17 days ago - Stars: 100 - Forks: 10

DirtyHarryLYL/DJ-RN

As a part of HAKE project (HAKE-3D). Code for our CVPR2020 paper "Detailed 2D-3D Joint Representation for Human-Object Interaction".

Language: Python - Size: 5.3 MB - Last synced at: 23 days ago - Pushed at: over 2 years ago - Stars: 100 - Forks: 14

jianghaojun/Awesome-3D-Vision-and-Language

A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.

Size: 33.2 KB - Last synced at: 11 days ago - Pushed at: about 2 years ago - Stars: 97 - Forks: 5

kyegomez/PALI

Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"

Language: Python - Size: 624 KB - Last synced at: 21 days ago - Pushed at: about 1 year ago - Stars: 89 - Forks: 8

JerryX1110/awesome-rvos

Referring Video Object Segmentation / Multi-Object Tracking Repo

Language: Python - Size: 79.1 KB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 87 - Forks: 4

ch3cook-fdu/Vote2Cap-DETR

[CVPR 2023] Vote2Cap-DETR and [T-PAMI 2024] Vote2Cap-DETR++; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D Dense Captioning methods

Language: Python - Size: 308 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 82 - Forks: 5

Vision-CAIR/3DCoMPaT-v2

3DCoMPaT++: An improved large-scale 3D vision dataset for compositional recognition

Language: Python - Size: 133 MB - Last synced at: 8 days ago - Pushed at: 10 months ago - Stars: 82 - Forks: 6

referit3d/referit3d

Code accompanying our ECCV-2020 paper on 3D Neural Listeners.

Language: C++ - Size: 15.8 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 81 - Forks: 13

declare-lab/LLM-PuzzleTest

This repository is maintained to release dataset and models for multimodal puzzle reasoning.

Language: Python - Size: 131 MB - Last synced at: 16 days ago - Pushed at: about 2 months ago - Stars: 78 - Forks: 7

idearibosome/embracenet

Robust multimodal integration method implemented in PyTorch and TensorFlow

Language: Python - Size: 107 KB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 78 - Forks: 25

ilaria-manco/muscaps

Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)

Language: Jupyter Notebook - Size: 91.9 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 77 - Forks: 7

willxxy/awesome-mmps

Corpus of resources for multimodal machine learning with physiological signals (mmps).

Size: 1.03 MB - Last synced at: 1 day ago - Pushed at: 19 days ago - Stars: 73 - Forks: 2

kyegomez/Kosmos2.5

My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"

Language: Python - Size: 231 KB - Last synced at: 18 days ago - Pushed at: 20 days ago - Stars: 73 - Forks: 6

declare-lab/BBFN

This repository contains the implementation of the paper -- Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

Language: Python - Size: 1.25 MB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 71 - Forks: 14

akashe/Multimodal-action-recognition

Code on selecting an action based on multimodal inputs. Here in this case inputs are voice and text.

Language: Python - Size: 64.7 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 69 - Forks: 11

aehrc/cvt2distilgpt2

Improving Chest X-Ray Report Generation by Leveraging Warm-Starting

Language: Python - Size: 93.5 MB - Last synced at: 12 days ago - Pushed at: 11 months ago - Stars: 67 - Forks: 7

vvvb-github/AVSegFormer

[AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer

Language: Python - Size: 486 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 62 - Forks: 5

rizavelioglu/hateful_memes-hate_detectron

Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxiv.org/abs/2012.12975

Language: Jupyter Notebook - Size: 11.2 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 59 - Forks: 18

anita-hu/MSAF

Offical implementation of paper "MSAF: Multimodal Split Attention Fusion"

Language: Python - Size: 101 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 59 - Forks: 9

imatge-upc/wav2pix Fork of miqueltubau/Wav2Pix

Speech-conditioned face generation using Generative Adversarial Networks (ICASSP 2019)

Language: Python - Size: 202 MB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 55 - Forks: 24

Sreyan88/MMER

Code for the InterSpeech 2023 paper: MMER: Multimodal Multi-task learning for Speech Emotion Recognition

Language: Python - Size: 1.59 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 54 - Forks: 14

ForestsKing/Awesome-Multimodal-Time-Series

A curated list of paper, code, data, and other resources focus on multimodal time series analysis.

Size: 7.81 KB - Last synced at: 12 days ago - Pushed at: 24 days ago - Stars: 53 - Forks: 3

sutdcv/SUTD-TrafficQA

[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

Language: JavaScript - Size: 6 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 53 - Forks: 2

thuiar/MIntRec

MIntRec: A New Dataset for Multimodal Intent Recognition (ACM MM 2022)

Language: Python - Size: 1.49 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 53 - Forks: 8

marslanm/Multimodality-Representation-Learning

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .

Size: 63.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 51 - Forks: 7

ai4ce/MARS

[CVPR2024] Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

Language: Python - Size: 370 MB - Last synced at: 12 days ago - Pushed at: 10 months ago - Stars: 50 - Forks: 1

firojalam/multimodal_social_media

multimodal social media content (text, image) classification

Language: Python - Size: 3.54 MB - Last synced at: 22 days ago - Pushed at: almost 3 years ago - Stars: 50 - Forks: 14

HLTCHKUST/VG-GPLMs

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Language: Python - Size: 9.32 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 49 - Forks: 8

naver/artemis

Official code release for ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (published at ICLR 2022)

Language: Python - Size: 1.26 MB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 48 - Forks: 4

lmb-freiburg/Multimodal-Future-Prediction

The official repository for the CVPR 2019 paper "Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction"

Language: Python - Size: 21.6 MB - Last synced at: 12 months ago - Pushed at: over 4 years ago - Stars: 47 - Forks: 8

ManifoldRG/NEKO

In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks

Language: Python - Size: 515 KB - Last synced at: 6 months ago - Pushed at: 10 months ago - Stars: 46 - Forks: 10

aimotive/aimotive_dataset

aiMotive public dataset

Size: 23.4 KB - Last synced at: 9 months ago - Pushed at: over 1 year ago - Stars: 44 - Forks: 2

penghu-cs/MRL

Learning Cross-Modal Retrieval with Noisy Labels (CVPR 2021, PyTorch Code)

Language: Python - Size: 23.9 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 44 - Forks: 10

YeonwooSung/LIMoE-pytorch

PyTorch implementation of LIMoE

Language: Python - Size: 4.3 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 43 - Forks: 1

affjljoo3581/Job-Recommend-Competition

🥇KNOW기반 직업 추천 알고리즘 경진대회 1등 솔루션입니다🥇

Language: Python - Size: 1.74 MB - Last synced at: 30 days ago - Pushed at: about 3 years ago - Stars: 43 - Forks: 4

husseinmozannar/multimodal-deep-learning-for-disaster-response

Damage Identification in Social Media Posts using Multimodal Deep Learning: code and dataset

Language: Python - Size: 62.5 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 43 - Forks: 16

sk-aravind/3D-Bounding-Boxes-From-Monocular-Images

A two stage multi-modal loss model along with rigid body transformations to regress 3D bounding boxes

Language: Python - Size: 9.46 MB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 43 - Forks: 18

A2Zadeh/Social-IQ

[CVPR 2019 Oral] Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence

Language: Python - Size: 2.71 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 41 - Forks: 5

Yuco-Z/Awesome-Multi-Modal-Dialog

[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics

Size: 169 KB - Last synced at: about 1 hour ago - Pushed at: 3 months ago - Stars: 39 - Forks: 4

RaptorMai/MLLM-CompBench

[NeurIPS'25] MLLM-CompBench evaluates the comparative reasoning of MLLMs with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, state, emotion, temporality, spatiality, quantity, and quality. CompBench covers diverse visual domains, including animals, fashion, sports, and scenes

Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 38 - Forks: 2

42jaylonw/shifu

Lightweight Isaac Gym Environment Builder

Language: Python - Size: 31.6 MB - Last synced at: 21 days ago - Pushed at: over 2 years ago - Stars: 37 - Forks: 2

choyingw/CFCNet

NeurIPS 2019: Deep RGB-D Canonical Correlation Analysis For Sparse Depth Completion

Language: Python - Size: 31.8 MB - Last synced at: 14 days ago - Pushed at: about 4 years ago - Stars: 37 - Forks: 4

yanganYNU/AFFGCN

Attention Feature Fusion base on spatial-temporal Graph Convolutional Network(AFFGCN)

Language: Python - Size: 144 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 36 - Forks: 1

VisualWebBench/VisualWebBench

Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"

Language: Python - Size: 3.17 MB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 36 - Forks: 1

soloist97/densecap-pytorch

A simplified pytorch version of densecap

Language: Jupyter Notebook - Size: 5.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 34 - Forks: 8

Related Topics
deep-learning 105 multimodal 86 pytorch 64 computer-vision 56 machine-learning 47 multimodal-learning 38 natural-language-processing 26 nlp 24 multimodality 22 vision-and-language 21 tensorflow 20 large-language-models 19 python 18 transformer 16 attention-mechanism 16 transformers 16 generative-ai 14 multimodal-sentiment-analysis 14 artificial-intelligence 14 gpt4 13 llm 13 multimodal-large-language-models 13 deep-neural-networks 13 self-supervised-learning 12 classification 12 convolutional-neural-networks 11 dataset 11 emotion-recognition 11 visual-question-answering 11 attention 10 multimodal-datasets 10 neural-network 10 ai 9 image-processing 9 clip 9 awesome-list 8 time-series 8 vision-language-transformer 8 image 8 object-detection 8 language-model 8 sentiment-analysis 8 bert 8 vision-transformer 8 multimodal-data 7 image-captioning 7 vision-language 7 image-classification 7 multimodal-fusion 7 vision-language-model 7 diffusion-models 7 multimodal-representation 7 representation-learning 7 pytorch-lightning 7 cnn 7 reinforcement-learning 6 deeplearning 6 3d 6 huggingface-transformers 6 foundation-models 6 text-to-image 6 graph-neural-networks 6 keras 6 lstm 6 remote-sensing 6 neural-networks 6 vision-language-pretraining 6 audio-processing 5 transformer-models 5 speech-recognition 5 text 5 recommender-system 5 variational-autoencoder 5 anomaly-detection 5 image-generation 5 data-fusion 5 generative-adversarial-network 5 transfer-learning 5 generative-model 5 attention-is-all-you-need 5 python3 5 audio 5 vqa 5 visual-grounding 5 large-multimodal-models 5 point-cloud 5 gan 5 embeddings 5 nlp-machine-learning 5 contrastive-learning 5 question-answering 5 multimodal-interactions 5 paper 5 memes 5 semantic-segmentation 5 visual-dialog 4 multi-modal 4 domain-adaptation 4 knowledge-graph 4 cross-modal-retrieval 4