GitHub topics: multimodal-deep-learning
kyegomez/the-compiler
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
Language: Python - Size: 10.4 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 145 - Forks: 17

ashutosh1919/data2vec-pytorch
Ready to run PyTorch implementation of Data2Vec 2.0: Highly efficient self-supervised representation learning for vision, speech and text.
Language: Python - Size: 116 KB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 14 - Forks: 2

nlp-unibo/multimodal-am-fallacy
Multimodal Fallacy Classification in Political Debates: Dataset and Experiments.
Language: Python - Size: 11.2 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

AlfredoBaione/Music_to_figurative_art
A project for generating artistic images semantically relatead to music inputs.
Language: Python - Size: 5.65 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

MohamedTharwat21/MemexQA
MemexQA is a project designed to tackle the challenge of real-life multimodal question answering by leveraging both visual and textual data from personal photo albums.
Language: Python - Size: 5.12 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

kyegomez/Kosmos2.5
My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"
Language: Python - Size: 231 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 73 - Forks: 6

Avir-AI/handimage_mamba
[IKT 2024] A Multi-Task Framework Using Mamba for Identity, Age, and Gender Classification from Hand Images
Language: Python - Size: 73.2 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

codezakh/DataEnvGym
A testbed for agents and environments that can automatically improve models through data generation.
Language: Python - Size: 9.16 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 19 - Forks: 5

kyegomez/swarms-pytorch
Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊
Language: Python - Size: 58.2 MB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 121 - Forks: 10

DWCTOD/ECCV2022-Papers-with-Code-Demo
收集 ECCV 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!
Size: 170 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 286 - Forks: 23

chikap421/videosam
This repository accompanies the paper "VideoSAM: A Large Vision Foundation Model for High-Speed Video Segmentation"
Language: Jupyter Notebook - Size: 160 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 5 - Forks: 1

jianghaojun/Awesome-3D-Vision-and-Language
A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.
Size: 33.2 KB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 97 - Forks: 5

discover-Austin/multimodal-emotion-recognition
A deep learning system for real-time emotion recognition from both text and images using transformers.
Language: Python - Size: 0 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

deepmancer/deepmancer
"When in doubt, use brute force." - Ken Thompson
Size: 431 KB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

declare-lab/multimodal-deep-learning
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
Language: OpenEdge ABL - Size: 181 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 801 - Forks: 157

Computational-social-science/Skew-pair_Fusion
We propose a holistic framework that formalizes a dual interpretable mechanism, comprising universal skew-layer alignment and bootstrapping sparsity, to enhance fusion gain in hybrid neural networks.
Language: Jupyter Notebook - Size: 6.78 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

kyegomez/PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
Language: Python - Size: 2.61 MB - Last synced at: 30 days ago - Pushed at: about 1 month ago - Stars: 145 - Forks: 4

omeregev/click2mask
[AAAI 2025] Official Implementation for "Click2Mask: Local Editing with Dynamic Mask Generation" Paper.
Language: Python - Size: 61.9 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 14 - Forks: 2

vvvb-github/AVSegFormer
[AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer
Language: Python - Size: 486 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 62 - Forks: 5

drprojects/DeepViewAgg
[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
Language: Python - Size: 302 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 228 - Forks: 25

naver/artemis
Official code release for ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (published at ICLR 2022)
Language: Python - Size: 1.26 MB - Last synced at: 28 days ago - Pushed at: over 2 years ago - Stars: 48 - Forks: 4

first-coding/Multimodal-Assistant
Language: Python - Size: 1.29 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

JanTeichertKluge/DMLSim
This library provides packages on DoubleML / Causal Machine Learning and Neural Networks in Python for Simulation and Case Studies.
Language: Python - Size: 145 KB - Last synced at: 29 days ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 0

jahez07/Multimodal-Fusion-Strategy-to-Classify-Malware
This work focuses on proposing a novel approach towards classifying malware binaries by extracting visual features from malware executables.
Language: Jupyter Notebook - Size: 257 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 4 - Forks: 0

sisinflab/Ducho
Python framework to extract multimodal features for multimodal recommendation in a highly-customizable way.
Language: Python - Size: 3.62 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 22 - Forks: 5

kritiksoman/Multimodal
Listen. Write. Speak. Read. Think.
Language: Jupyter Notebook - Size: 15.2 MB - Last synced at: 10 days ago - Pushed at: about 3 years ago - Stars: 11 - Forks: 0

yuhui-zh15/drml
Official Code Release for "Diagnosing and Rectifying Vision Models using Language" (ICLR 2023)
Language: Jupyter Notebook - Size: 19.2 MB - Last synced at: 28 days ago - Pushed at: almost 2 years ago - Stars: 33 - Forks: 0

Yuco-Z/Awesome-Multi-Modal-Dialog
[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics
Size: 169 KB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 39 - Forks: 4

AI4Patents/IMPACT
IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents (NeurIPS 2024)
Language: Jupyter Notebook - Size: 23.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 7 - Forks: 1

westlake-repl/IDvs.MoRec
End-to-end Training for Multimodal Recommendation Systems
Language: Python - Size: 57.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 139 - Forks: 18

fork123aniket/Agentic-RAG-Story-Generation-with-Multimodal-GenAI
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
Language: Python - Size: 94.7 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

IDEA-Research/ChatRex
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Language: Python - Size: 8.8 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 124 - Forks: 3

JunweiLiang/FVTA_MemexQA
Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19
Language: Python - Size: 723 KB - Last synced at: 23 days ago - Pushed at: almost 6 years ago - Stars: 32 - Forks: 15

Pol-Buitrago/SynthAVSR
This repository contains the development of SynthAVSR, the first Audiovisual Speech Recognition (AVSR) system tailored for the Spanish and Catalan languages. Based on the AV-HuBERT (Audio-Visual Hidden Unit BERT) model, SynthAVSR leverages synthetic audiovisual data to bridge the gap in speech recognition technology for these languages.
Language: Python - Size: 290 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Adm-2005/DeMorph
Deepfake Detection Solution using Multimodal Approach.
Language: Python - Size: 10.9 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

ThomasHelfer/multimodal-supernovae
A codebase dedicated to exploring multimodal learning approaches by integrating images of host galaxies of supernovae and their corresponding light-curves and spectra.
Language: Jupyter Notebook - Size: 1.66 GB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 12 - Forks: 2

ZhaoPeiduo/BLIP2-Japanese
Modifying LAVIS' BLIP2 Q-former with models pretrained on Japanese datasets.
Language: Python - Size: 75.9 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 12 - Forks: 1

mbappeenjoyer/GIF-QA
Documentation of the approach employed to tackle the task of GIF Question Answering
Language: Jupyter Notebook - Size: 2.72 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

theislab/scarches
Reference mapping for single-cell genomics
Language: Jupyter Notebook - Size: 825 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 347 - Forks: 52

fevieira27/ImageRecognitionAI-R
R Script for AI Image and Location Recognition that can also generate an automated prompt for AI text-generation of a social media post.
Language: R - Size: 879 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

kyegomez/Pegasus
PegasusX: The Future of Multimodal Embeddings 🦄 🦄
Language: Python - Size: 37.5 MB - Last synced at: 27 days ago - Pushed at: 7 months ago - Stars: 16 - Forks: 5

ksm26/Introducing-Multimodal-Llama-3.2
This repository focuses on the cutting-edge features of Llama 3.2, including multimodal capabilities, advanced tokenization, and tool calling for building next-gen AI applications. It highlights Llama's enhanced image reasoning, multilingual support, and the Llama Stack API for seamless customization and orchestration.
Language: Jupyter Notebook - Size: 3.79 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

FuxiaoLiu/DocumentCLIP
[ICPRAI 2024] DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
Language: Python - Size: 2.49 MB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 0

busraoguzoglu/Image-Similarity-Search
Using CLIP/Titan/ALIGN for Multimodal Image Search: Searching images with a keyword or with a sample image
Language: Jupyter Notebook - Size: 11.2 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

attarmau/Multimodal-Misinformation-Detection
Multimodal deep learning model for fake news classification.
Size: 7.81 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

kassy11/daicwoz_voice
Preprocessing and feature extraction for raw voice data of DAIC-WOZ
Language: Python - Size: 1.95 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

FIVEYOUNGWOO/WiFiMobNet
WiFi-Camera multimodal learning-based object detection and pose estimation.
Language: Python - Size: 560 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 2 - Forks: 1

sutdcv/SUTD-TrafficQA
[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
Language: JavaScript - Size: 6 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 53 - Forks: 2

Vasugi2003/Fusion-AI---MultiModal-Persuvasiveness-Prediction
Developed a system to predict persuasiveness using multi-modal data (text, images, audio). Utilized BERT for text embeddings, ResNet for image features, and Librosa for audio analysis. Fused data from all modalities for enhanced prediction accuracy.
Language: Jupyter Notebook - Size: 770 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

haamoon/mmtm
Implementation of CVPR 2020 paper "MMTM: Multimodal Transfer Module for CNN Fusion"
Language: Python - Size: 47.9 KB - Last synced at: 4 months ago - Pushed at: almost 5 years ago - Stars: 112 - Forks: 21

Eva-Kaushik/EMKGCN-MultiModal-Music-Recommender
The `MKGCN` class, coupled with the Spotify API, orchestrates a multi-modal knowledge graph convolutional network to enhance music recommendation systems by integrating user interaction data and diverse music modalities.
Language: Jupyter Notebook - Size: 11 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 1

UmarIgan/Machine-Learning
A set of jupyter notebooks
Language: Jupyter Notebook - Size: 16.7 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 23 - Forks: 8

kelechi-c/ripple_net
image retrieval/tagging with CLIP
Language: Python - Size: 416 KB - Last synced at: 20 days ago - Pushed at: 10 months ago - Stars: 13 - Forks: 1

deepur71/InstructPix2Pix
Implementation of InstructPix2Pix from scratch
Language: Python - Size: 0 Bytes - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

ahmdtaha/distributed_sigmoid_loss
Unofficial implementation for Sigmoid Loss for Language Image Pre-Training
Language: Python - Size: 62.5 KB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

firojalam/multimodal_social_media
multimodal social media content (text, image) classification
Language: Python - Size: 3.54 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 50 - Forks: 14

ilaria-manco/muscaps
Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)
Language: Jupyter Notebook - Size: 91.9 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 77 - Forks: 7

zhongshsh/MoExtend
ACL 2024 (SRW), Official Codebase of our Paper: "MoExtend: Tuning New Experts for Modality and Task Extension"
Language: Python - Size: 542 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 10 - Forks: 0

meysam-safarzadeh/multimodal
This project is a multi-modal transformer based model to fuse RGB, Thermal, and depth modalities in order to predict pain intensity in 5 classes.
Language: Python - Size: 111 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

chikap421/mseg_vcuq
This repository accompanies the paper "MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data"
Language: MATLAB - Size: 1.48 GB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

Gaoqiandi/MultiT2
MultiT2 is an algorithm that connects disparate data from bacterial aromatic polyketides through multimodal learning. It specifically focuses on integrating protein sequences (CLFs) and chemical structures (SMILES) to predict and discover type II polyketide (T2PK) natural products.
Language: Jupyter Notebook - Size: 63 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

BorgwardtLab/DeepEST
Language: Python - Size: 396 KB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

eftekhar-hossain/Bengali-Aggression-Memes Fork of shawlyahsan/Bengali-Aggression-Memes
[EACL'24] A Multimodal Framework to Detect Target Aware Aggression Memes
Language: Python - Size: 535 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

unitaryai/VTC
VTC: Improving Video-Text Retrieval with User Comments
Language: Python - Size: 5.45 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 0

yanganYNU/AFFGCN
Attention Feature Fusion base on spatial-temporal Graph Convolutional Network(AFFGCN)
Language: Python - Size: 144 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 36 - Forks: 1

ManifoldRG/NEKO
In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks
Language: Python - Size: 515 KB - Last synced at: 7 months ago - Pushed at: 11 months ago - Stars: 46 - Forks: 10

yongfanbeta/awesome-multimodal-healthcare
Reading list for multimodal learning in healthcare
Size: 8.79 KB - Last synced at: 24 days ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 2

anamabo/SegmentWater
Tools to create output for Paligemma to segment water in satellite images.
Language: Jupyter Notebook - Size: 27.3 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

JerryX1110/awesome-rvos
Referring Video Object Segmentation / Multi-Object Tracking Repo
Language: Python - Size: 79.1 KB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 87 - Forks: 4

phellonchen/awesome-visual-dialog
Recent Advances in Visual Dialog
Size: 36.1 KB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 30 - Forks: 1

choyingw/CFCNet
NeurIPS 2019: Deep RGB-D Canonical Correlation Analysis For Sparse Depth Completion
Language: Python - Size: 31.8 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 37 - Forks: 4

katerynaCh/MMA-DFER
This repository provides the codes for MMA-DFER: multimodal (audiovisual) emotion recognition method. This is an official implementation for the paper MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild.
Language: Python - Size: 1.77 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 10 - Forks: 1

Highdrien/MultiModal-Model
Multimodal Model which take text audio and video to predict the turn taking. That is, to predict whether the speaker in a discussion will change.
Language: Python - Size: 92 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

hubtru/Impala
Expandable Isotropic Multimodal Patch Learning Neural Architecture for the Nano-modal (9) time-series and images data.
Language: Jupyter Notebook - Size: 1.08 GB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

Yuan-ManX/ai-multimodal-timeline
Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D content. 🔥
Size: 1.11 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 18 - Forks: 1

hubtru/Minape
Multimodal Isotropic Neural Architecture with Patch Embedding to both time series and image data for classification purposes.
Language: Jupyter Notebook - Size: 47 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 3 - Forks: 1

jon-chun/multisentimentarcs
A Novel Method to Visualize Multimodal AI Sentiment Arcs in Long-Form Narratives
Language: Python - Size: 284 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

jena-shreyas/Efficient-VidQA
Part of my work for my Bachelor's Thesis Project on Counterfactual Reasoning for Videos.
Language: Python - Size: 11.5 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

sathya-ml/multimodal-vrnn-vae
A PyTorch implementation of multimodal VRNN and VAE.
Language: Python - Size: 26.4 KB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

visinf/lnfmm
Latent Normalizing Flows for Many-to-Many Cross Domain Mappings (ICLR 2020)
Language: Python - Size: 1000 KB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 33 - Forks: 12

RunyuFan/FusionMixer-TGRS-2022
Code for TGRS 2022 paper "Multilevel Spatial-Channel Feature Fusion Network for Urban Village Classification by Fusing Satellite and Streetview Images"
Language: Python - Size: 51.8 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 3 - Forks: 2

declare-lab/MSA-Robustness
NAACL 2022 paper on Analyzing Modality Robustness in Multimodal Sentiment Analysis
Language: Python - Size: 3.43 MB - Last synced at: 29 days ago - Pushed at: over 2 years ago - Stars: 31 - Forks: 5

ch3cook-fdu/Vote2Cap-DETR
[CVPR 2023] Vote2Cap-DETR and [T-PAMI 2024] Vote2Cap-DETR++; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D Dense Captioning methods
Language: Python - Size: 308 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 82 - Forks: 5

declare-lab/M2H2-dataset
This repository contains the dataset and baselines explained in the paper: M2H2: A Multimodal Multiparty Hindi Dataset For HumorRecognition in Conversations
Language: Python - Size: 2.21 GB - Last synced at: 29 days ago - Pushed at: about 2 years ago - Stars: 18 - Forks: 12

Rehan-Ahmad/MultimodalDiarization
Multimodal speaker diarization using pre-trained audio-visual synchronization model
Language: Python - Size: 38.1 KB - Last synced at: 9 months ago - Pushed at: about 5 years ago - Stars: 9 - Forks: 6

monajalal/Kenyan-Food
code and link to the dataset for Kenyan Food detection paper accepted as a paper in MADiMA 2019 Workshop as part of ACM MM 2019 conference.
Language: Python - Size: 5.08 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 9 - Forks: 6

hubtru/Siren
SIREN Scalable, Isotropic Recursive Column Multimodal Neural Architecture with Device State Recognition Use-Case
Language: Jupyter Notebook - Size: 189 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

mazleon/Hateful_Meme_Challenge
Hateful Memes dataset contains real hate speech. The Real Hateful Memes dataset consists of more than 10,000 newly created examples by Facebook AI.
Language: Jupyter Notebook - Size: 1.38 MB - Last synced at: 9 days ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 0

kyegomez/Med-PaLM
Towards Generalist Biomedical AI
Language: Python - Size: 850 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 219 - Forks: 35

HySonLab/Protein_Pretrain
Multimodal Pretraining for Unsupervised Protein Representation Learning
Language: Python - Size: 241 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 12 - Forks: 2

ibnaleem/mikael
a Discord chatbot trained on Mistral and LLaVA language models
Language: Python - Size: 3.53 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 1

Anne-Andresen/Multi-Modal-cuda-C-GAN
Raw C/cuda implementation of 3d GAN
Language: Cuda - Size: 156 KB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 1

aimotive/aimotive_dataset
aiMotive public dataset
Size: 23.4 KB - Last synced at: 9 months ago - Pushed at: over 1 year ago - Stars: 44 - Forks: 2

nngocson2002/ViVQA
The Multimodal Model for Vietnamese Visual Question Answering (ViVQA)
Language: Python - Size: 1.02 MB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 7 - Forks: 0

jianghaojun/Awesome-Parameter-Efficient-Transfer-Learning
A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
Size: 165 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 372 - Forks: 23

pabloggarc/TFG
Clasificación de imágenes y asignación de textos mediante redes neuronales convolucionales y transformers multimodales
Language: Jupyter Notebook - Size: 275 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Saumya-svm/Multimodal-SSL-MusicRep
A small project to explore multimodal representation learning.
Language: Python - Size: 84.5 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

dermatologist/kedro-tf-text
Kedro pipelines for preprocessing text and tabular data for multi-modal ML in TensorFlow.
Language: Python - Size: 142 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

kyegomez/MMCA-MGQA
Experiments around using Multi-Modal Casual Attention with Multi-Grouped Query Attention
Language: Python - Size: 210 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

kyegomez/Odin
SOTA Classification at scale for UAVs, Drones, and much more
Language: Python - Size: 211 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0
