An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: multimodal-deep-learning

kyegomez/the-compiler

Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!

Language: Python - Size: 10.4 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 145 - Forks: 17

ashutosh1919/data2vec-pytorch

Ready to run PyTorch implementation of Data2Vec 2.0: Highly efficient self-supervised representation learning for vision, speech and text.

Language: Python - Size: 116 KB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 14 - Forks: 2

nlp-unibo/multimodal-am-fallacy

Multimodal Fallacy Classification in Political Debates: Dataset and Experiments.

Language: Python - Size: 11.2 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

AlfredoBaione/Music_to_figurative_art

A project for generating artistic images semantically relatead to music inputs.

Language: Python - Size: 5.65 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

MohamedTharwat21/MemexQA

MemexQA is a project designed to tackle the challenge of real-life multimodal question answering by leveraging both visual and textual data from personal photo albums.

Language: Python - Size: 5.12 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

kyegomez/Kosmos2.5

My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"

Language: Python - Size: 231 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 73 - Forks: 6

Avir-AI/handimage_mamba

[IKT 2024] A Multi-Task Framework Using Mamba for Identity, Age, and Gender Classification from Hand Images

Language: Python - Size: 73.2 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

codezakh/DataEnvGym

A testbed for agents and environments that can automatically improve models through data generation.

Language: Python - Size: 9.16 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 19 - Forks: 5

kyegomez/swarms-pytorch

Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊

Language: Python - Size: 58.2 MB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 121 - Forks: 10

DWCTOD/ECCV2022-Papers-with-Code-Demo

收集 ECCV 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!

Size: 170 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 286 - Forks: 23

chikap421/videosam

This repository accompanies the paper "VideoSAM: A Large Vision Foundation Model for High-Speed Video Segmentation"

Language: Jupyter Notebook - Size: 160 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 5 - Forks: 1

jianghaojun/Awesome-3D-Vision-and-Language

A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.

Size: 33.2 KB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 97 - Forks: 5

discover-Austin/multimodal-emotion-recognition

A deep learning system for real-time emotion recognition from both text and images using transformers.

Language: Python - Size: 0 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

deepmancer/deepmancer

"When in doubt, use brute force." - Ken Thompson

Size: 431 KB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

declare-lab/multimodal-deep-learning

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Language: OpenEdge ABL - Size: 181 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 801 - Forks: 157

Computational-social-science/Skew-pair_Fusion

We propose a holistic framework that formalizes a dual interpretable mechanism, comprising universal skew-layer alignment and bootstrapping sparsity, to enhance fusion gain in hybrid neural networks.

Language: Jupyter Notebook - Size: 6.78 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

kyegomez/PALI3

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

Language: Python - Size: 2.61 MB - Last synced at: 30 days ago - Pushed at: about 1 month ago - Stars: 145 - Forks: 4

omeregev/click2mask

[AAAI 2025] Official Implementation for "Click2Mask: Local Editing with Dynamic Mask Generation" Paper.

Language: Python - Size: 61.9 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 14 - Forks: 2

vvvb-github/AVSegFormer

[AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer

Language: Python - Size: 486 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 62 - Forks: 5

drprojects/DeepViewAgg

[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"

Language: Python - Size: 302 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 228 - Forks: 25

naver/artemis

Official code release for ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (published at ICLR 2022)

Language: Python - Size: 1.26 MB - Last synced at: 28 days ago - Pushed at: over 2 years ago - Stars: 48 - Forks: 4

first-coding/Multimodal-Assistant

Language: Python - Size: 1.29 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

JanTeichertKluge/DMLSim

This library provides packages on DoubleML / Causal Machine Learning and Neural Networks in Python for Simulation and Case Studies.

Language: Python - Size: 145 KB - Last synced at: 29 days ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 0

jahez07/Multimodal-Fusion-Strategy-to-Classify-Malware

This work focuses on proposing a novel approach towards classifying malware binaries by extracting visual features from malware executables.

Language: Jupyter Notebook - Size: 257 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 4 - Forks: 0

sisinflab/Ducho

Python framework to extract multimodal features for multimodal recommendation in a highly-customizable way.

Language: Python - Size: 3.62 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 22 - Forks: 5

kritiksoman/Multimodal

Listen. Write. Speak. Read. Think.

Language: Jupyter Notebook - Size: 15.2 MB - Last synced at: 10 days ago - Pushed at: about 3 years ago - Stars: 11 - Forks: 0

yuhui-zh15/drml

Official Code Release for "Diagnosing and Rectifying Vision Models using Language" (ICLR 2023)

Language: Jupyter Notebook - Size: 19.2 MB - Last synced at: 28 days ago - Pushed at: almost 2 years ago - Stars: 33 - Forks: 0

Yuco-Z/Awesome-Multi-Modal-Dialog

[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics

Size: 169 KB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 39 - Forks: 4

AI4Patents/IMPACT

IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents (NeurIPS 2024)

Language: Jupyter Notebook - Size: 23.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 7 - Forks: 1

westlake-repl/IDvs.MoRec

End-to-end Training for Multimodal Recommendation Systems

Language: Python - Size: 57.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 139 - Forks: 18

fork123aniket/Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling

Language: Python - Size: 94.7 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

IDEA-Research/ChatRex

Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

Language: Python - Size: 8.8 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 124 - Forks: 3

JunweiLiang/FVTA_MemexQA

Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19

Language: Python - Size: 723 KB - Last synced at: 23 days ago - Pushed at: almost 6 years ago - Stars: 32 - Forks: 15

Pol-Buitrago/SynthAVSR

This repository contains the development of SynthAVSR, the first Audiovisual Speech Recognition (AVSR) system tailored for the Spanish and Catalan languages. Based on the AV-HuBERT (Audio-Visual Hidden Unit BERT) model, SynthAVSR leverages synthetic audiovisual data to bridge the gap in speech recognition technology for these languages.

Language: Python - Size: 290 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Adm-2005/DeMorph

Deepfake Detection Solution using Multimodal Approach.

Language: Python - Size: 10.9 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

ThomasHelfer/multimodal-supernovae

A codebase dedicated to exploring multimodal learning approaches by integrating images of host galaxies of supernovae and their corresponding light-curves and spectra.

Language: Jupyter Notebook - Size: 1.66 GB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 12 - Forks: 2

ZhaoPeiduo/BLIP2-Japanese

Modifying LAVIS' BLIP2 Q-former with models pretrained on Japanese datasets.

Language: Python - Size: 75.9 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 12 - Forks: 1

mbappeenjoyer/GIF-QA

Documentation of the approach employed to tackle the task of GIF Question Answering

Language: Jupyter Notebook - Size: 2.72 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

theislab/scarches

Reference mapping for single-cell genomics

Language: Jupyter Notebook - Size: 825 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 347 - Forks: 52

fevieira27/ImageRecognitionAI-R

R Script for AI Image and Location Recognition that can also generate an automated prompt for AI text-generation of a social media post.

Language: R - Size: 879 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

kyegomez/Pegasus

PegasusX: The Future of Multimodal Embeddings 🦄 🦄

Language: Python - Size: 37.5 MB - Last synced at: 27 days ago - Pushed at: 7 months ago - Stars: 16 - Forks: 5

ksm26/Introducing-Multimodal-Llama-3.2

This repository focuses on the cutting-edge features of Llama 3.2, including multimodal capabilities, advanced tokenization, and tool calling for building next-gen AI applications. It highlights Llama's enhanced image reasoning, multilingual support, and the Llama Stack API for seamless customization and orchestration.

Language: Jupyter Notebook - Size: 3.79 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

FuxiaoLiu/DocumentCLIP

[ICPRAI 2024] DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents

Language: Python - Size: 2.49 MB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 0

busraoguzoglu/Image-Similarity-Search

Using CLIP/Titan/ALIGN for Multimodal Image Search: Searching images with a keyword or with a sample image

Language: Jupyter Notebook - Size: 11.2 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

attarmau/Multimodal-Misinformation-Detection

Multimodal deep learning model for fake news classification.

Size: 7.81 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

kassy11/daicwoz_voice

Preprocessing and feature extraction for raw voice data of DAIC-WOZ

Language: Python - Size: 1.95 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

FIVEYOUNGWOO/WiFiMobNet

WiFi-Camera multimodal learning-based object detection and pose estimation.

Language: Python - Size: 560 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 2 - Forks: 1

sutdcv/SUTD-TrafficQA

[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

Language: JavaScript - Size: 6 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 53 - Forks: 2

Vasugi2003/Fusion-AI---MultiModal-Persuvasiveness-Prediction

Developed a system to predict persuasiveness using multi-modal data (text, images, audio). Utilized BERT for text embeddings, ResNet for image features, and Librosa for audio analysis. Fused data from all modalities for enhanced prediction accuracy.

Language: Jupyter Notebook - Size: 770 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

haamoon/mmtm

Implementation of CVPR 2020 paper "MMTM: Multimodal Transfer Module for CNN Fusion"

Language: Python - Size: 47.9 KB - Last synced at: 4 months ago - Pushed at: almost 5 years ago - Stars: 112 - Forks: 21

Eva-Kaushik/EMKGCN-MultiModal-Music-Recommender

The `MKGCN` class, coupled with the Spotify API, orchestrates a multi-modal knowledge graph convolutional network to enhance music recommendation systems by integrating user interaction data and diverse music modalities.

Language: Jupyter Notebook - Size: 11 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 1

UmarIgan/Machine-Learning

A set of jupyter notebooks

Language: Jupyter Notebook - Size: 16.7 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 23 - Forks: 8

kelechi-c/ripple_net

image retrieval/tagging with CLIP

Language: Python - Size: 416 KB - Last synced at: 20 days ago - Pushed at: 10 months ago - Stars: 13 - Forks: 1

deepur71/InstructPix2Pix

Implementation of InstructPix2Pix from scratch

Language: Python - Size: 0 Bytes - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

ahmdtaha/distributed_sigmoid_loss

Unofficial implementation for Sigmoid Loss for Language Image Pre-Training

Language: Python - Size: 62.5 KB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

firojalam/multimodal_social_media

multimodal social media content (text, image) classification

Language: Python - Size: 3.54 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 50 - Forks: 14

ilaria-manco/muscaps

Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)

Language: Jupyter Notebook - Size: 91.9 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 77 - Forks: 7

zhongshsh/MoExtend

ACL 2024 (SRW), Official Codebase of our Paper: "MoExtend: Tuning New Experts for Modality and Task Extension"

Language: Python - Size: 542 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 10 - Forks: 0

meysam-safarzadeh/multimodal

This project is a multi-modal transformer based model to fuse RGB, Thermal, and depth modalities in order to predict pain intensity in 5 classes.

Language: Python - Size: 111 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

chikap421/mseg_vcuq

This repository accompanies the paper "MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data"

Language: MATLAB - Size: 1.48 GB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

Gaoqiandi/MultiT2

MultiT2 is an algorithm that connects disparate data from bacterial aromatic polyketides through multimodal learning. It specifically focuses on integrating protein sequences (CLFs) and chemical structures (SMILES) to predict and discover type II polyketide (T2PK) natural products.

Language: Jupyter Notebook - Size: 63 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

BorgwardtLab/DeepEST

Language: Python - Size: 396 KB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

eftekhar-hossain/Bengali-Aggression-Memes Fork of shawlyahsan/Bengali-Aggression-Memes

[EACL'24] A Multimodal Framework to Detect Target Aware Aggression Memes

Language: Python - Size: 535 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

unitaryai/VTC

VTC: Improving Video-Text Retrieval with User Comments

Language: Python - Size: 5.45 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 0

yanganYNU/AFFGCN

Attention Feature Fusion base on spatial-temporal Graph Convolutional Network(AFFGCN)

Language: Python - Size: 144 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 36 - Forks: 1

ManifoldRG/NEKO

In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks

Language: Python - Size: 515 KB - Last synced at: 7 months ago - Pushed at: 11 months ago - Stars: 46 - Forks: 10

yongfanbeta/awesome-multimodal-healthcare

Reading list for multimodal learning in healthcare

Size: 8.79 KB - Last synced at: 24 days ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 2

anamabo/SegmentWater

Tools to create output for Paligemma to segment water in satellite images.

Language: Jupyter Notebook - Size: 27.3 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

JerryX1110/awesome-rvos

Referring Video Object Segmentation / Multi-Object Tracking Repo

Language: Python - Size: 79.1 KB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 87 - Forks: 4

phellonchen/awesome-visual-dialog

Recent Advances in Visual Dialog

Size: 36.1 KB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 30 - Forks: 1

choyingw/CFCNet

NeurIPS 2019: Deep RGB-D Canonical Correlation Analysis For Sparse Depth Completion

Language: Python - Size: 31.8 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 37 - Forks: 4

katerynaCh/MMA-DFER

This repository provides the codes for MMA-DFER: multimodal (audiovisual) emotion recognition method. This is an official implementation for the paper MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild.

Language: Python - Size: 1.77 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 10 - Forks: 1

Highdrien/MultiModal-Model

Multimodal Model which take text audio and video to predict the turn taking. That is, to predict whether the speaker in a discussion will change.

Language: Python - Size: 92 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

hubtru/Impala

Expandable Isotropic Multimodal Patch Learning Neural Architecture for the Nano-modal (9) time-series and images data.

Language: Jupyter Notebook - Size: 1.08 GB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

Yuan-ManX/ai-multimodal-timeline

Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D content. 🔥

Size: 1.11 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 18 - Forks: 1

hubtru/Minape

Multimodal Isotropic Neural Architecture with Patch Embedding to both time series and image data for classification purposes.

Language: Jupyter Notebook - Size: 47 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 3 - Forks: 1

jon-chun/multisentimentarcs

A Novel Method to Visualize Multimodal AI Sentiment Arcs in Long-Form Narratives

Language: Python - Size: 284 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

jena-shreyas/Efficient-VidQA

Part of my work for my Bachelor's Thesis Project on Counterfactual Reasoning for Videos.

Language: Python - Size: 11.5 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

sathya-ml/multimodal-vrnn-vae

A PyTorch implementation of multimodal VRNN and VAE.

Language: Python - Size: 26.4 KB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

visinf/lnfmm

Latent Normalizing Flows for Many-to-Many Cross Domain Mappings (ICLR 2020)

Language: Python - Size: 1000 KB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 33 - Forks: 12

RunyuFan/FusionMixer-TGRS-2022

Code for TGRS 2022 paper "Multilevel Spatial-Channel Feature Fusion Network for Urban Village Classification by Fusing Satellite and Streetview Images"

Language: Python - Size: 51.8 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 3 - Forks: 2

declare-lab/MSA-Robustness

NAACL 2022 paper on Analyzing Modality Robustness in Multimodal Sentiment Analysis

Language: Python - Size: 3.43 MB - Last synced at: 29 days ago - Pushed at: over 2 years ago - Stars: 31 - Forks: 5

ch3cook-fdu/Vote2Cap-DETR

[CVPR 2023] Vote2Cap-DETR and [T-PAMI 2024] Vote2Cap-DETR++; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D Dense Captioning methods

Language: Python - Size: 308 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 82 - Forks: 5

declare-lab/M2H2-dataset

This repository contains the dataset and baselines explained in the paper: M2H2: A Multimodal Multiparty Hindi Dataset For HumorRecognition in Conversations

Language: Python - Size: 2.21 GB - Last synced at: 29 days ago - Pushed at: about 2 years ago - Stars: 18 - Forks: 12

Rehan-Ahmad/MultimodalDiarization

Multimodal speaker diarization using pre-trained audio-visual synchronization model

Language: Python - Size: 38.1 KB - Last synced at: 9 months ago - Pushed at: about 5 years ago - Stars: 9 - Forks: 6

monajalal/Kenyan-Food

code and link to the dataset for Kenyan Food detection paper accepted as a paper in MADiMA 2019 Workshop as part of ACM MM 2019 conference.

Language: Python - Size: 5.08 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 9 - Forks: 6

hubtru/Siren

SIREN Scalable, Isotropic Recursive Column Multimodal Neural Architecture with Device State Recognition Use-Case

Language: Jupyter Notebook - Size: 189 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

mazleon/Hateful_Meme_Challenge

Hateful Memes dataset contains real hate speech. The Real Hateful Memes dataset consists of more than 10,000 newly created examples by Facebook AI.

Language: Jupyter Notebook - Size: 1.38 MB - Last synced at: 9 days ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 0

kyegomez/Med-PaLM

Towards Generalist Biomedical AI

Language: Python - Size: 850 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 219 - Forks: 35

HySonLab/Protein_Pretrain

Multimodal Pretraining for Unsupervised Protein Representation Learning

Language: Python - Size: 241 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 12 - Forks: 2

ibnaleem/mikael

a Discord chatbot trained on Mistral and LLaVA language models

Language: Python - Size: 3.53 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 1

Anne-Andresen/Multi-Modal-cuda-C-GAN

Raw C/cuda implementation of 3d GAN

Language: Cuda - Size: 156 KB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 1

aimotive/aimotive_dataset

aiMotive public dataset

Size: 23.4 KB - Last synced at: 9 months ago - Pushed at: over 1 year ago - Stars: 44 - Forks: 2

nngocson2002/ViVQA

The Multimodal Model for Vietnamese Visual Question Answering (ViVQA)

Language: Python - Size: 1.02 MB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 7 - Forks: 0

jianghaojun/Awesome-Parameter-Efficient-Transfer-Learning

A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.

Size: 165 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 372 - Forks: 23

pabloggarc/TFG

Clasificación de imágenes y asignación de textos mediante redes neuronales convolucionales y transformers multimodales

Language: Jupyter Notebook - Size: 275 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Saumya-svm/Multimodal-SSL-MusicRep

A small project to explore multimodal representation learning.

Language: Python - Size: 84.5 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

dermatologist/kedro-tf-text

Kedro pipelines for preprocessing text and tabular data for multi-modal ML in TensorFlow.

Language: Python - Size: 142 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

kyegomez/MMCA-MGQA

Experiments around using Multi-Modal Casual Attention with Multi-Grouped Query Attention

Language: Python - Size: 210 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

kyegomez/Odin

SOTA Classification at scale for UAVs, Drones, and much more

Language: Python - Size: 211 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

Related Keywords
multimodal-deep-learning 413 deep-learning 106 multimodal 86 pytorch 64 computer-vision 56 machine-learning 48 multimodal-learning 38 natural-language-processing 26 nlp 24 multimodality 22 vision-and-language 21 tensorflow 20 python 20 large-language-models 19 transformer 16 transformers 16 attention-mechanism 16 generative-ai 14 multimodal-sentiment-analysis 14 llm 14 artificial-intelligence 14 self-supervised-learning 13 deep-neural-networks 13 multimodal-large-language-models 13 classification 13 gpt4 13 emotion-recognition 12 dataset 11 convolutional-neural-networks 11 visual-question-answering 11 neural-network 10 multimodal-datasets 10 attention 10 image-processing 9 time-series 9 ai 9 language-model 9 clip 9 sentiment-analysis 8 vision-transformer 8 object-detection 8 awesome-list 8 bert 8 image-classification 8 image 8 vision-language-transformer 8 diffusion-models 7 image-captioning 7 multimodal-representation 7 multimodal-fusion 7 pytorch-lightning 7 cnn 7 vision-language 7 representation-learning 7 vision-language-model 7 multimodal-data 7 graph-neural-networks 6 vision-language-pretraining 6 foundation-models 6 reinforcement-learning 6 keras 6 huggingface-transformers 6 lstm 6 neural-networks 6 deeplearning 6 remote-sensing 6 3d 6 text-to-image 6 generative-model 5 anomaly-detection 5 recommender-system 5 text 5 attention-is-all-you-need 5 transfer-learning 5 data-fusion 5 question-answering 5 audio-processing 5 transformer-models 5 large-multimodal-models 5 gan 5 audio 5 variational-autoencoder 5 nlp-machine-learning 5 visual-grounding 5 memes 5 contrastive-learning 5 multimodal-interactions 5 embeddings 5 image-generation 5 speech-recognition 5 generative-adversarial-network 5 vqa 5 semantic-segmentation 5 point-cloud 5 paper 5 python3 5 hateful-memes-challenge 4 multimodal-emotion-recognition 4 vision-and-language-pre-training 4 knowledge-graph 4