An open API service providing repository metadata for many open source software ecosystems.

Topic: "multimodality"

lucidrains/big-sleep

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

Language: Python - Size: 6.89 MB - Last synced at: 6 days ago - Pushed at: over 3 years ago - Stars: 2,570 - Forks: 306

BAAI-Agents/Cradle

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

Language: Python - Size: 433 MB - Last synced at: 13 days ago - Pushed at: 6 months ago - Stars: 2,088 - Forks: 185

hymie122/RAG-Survey

Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

Size: 6.49 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 1,601 - Forks: 110

PreferredAI/cornac

A Comparative Framework for Multimodal Recommender Systems

Language: Python - Size: 24.3 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 949 - Forks: 152

ArrowLuo/CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Language: Python - Size: 1.61 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 929 - Forks: 126

AIDC-AI/Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Language: Python - Size: 5.56 MB - Last synced at: 9 days ago - Pushed at: about 2 months ago - Stars: 904 - Forks: 56

fnzhan/Generative-AI

[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era

Language: TeX - Size: 121 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 753 - Forks: 57

aimclub/FEDOT

Automated modeling and machine learning framework FEDOT

Language: Python - Size: 225 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 667 - Forks: 87

VITA-MLLM/Woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

Language: Python - Size: 21.2 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 636 - Forks: 30

jshilong/GPT4RoI

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

Language: Python - Size: 15.1 MB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 528 - Forks: 28

microsoft/LLM2CLIP

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

Language: Python - Size: 2.93 MB - Last synced at: 5 days ago - Pushed at: about 2 months ago - Stars: 513 - Forks: 24

YingqingHe/Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

Language: HTML - Size: 12.7 MB - Last synced at: 10 days ago - Pushed at: about 2 months ago - Stars: 472 - Forks: 26

zengyan-97/X-VLM

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

Language: Python - Size: 13.5 MB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 462 - Forks: 51

afiaka87/clip-guided-diffusion

A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.

Language: Python - Size: 51.2 MB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 462 - Forks: 60

MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Language: Python - Size: 186 MB - Last synced at: about 16 hours ago - Pushed at: about 17 hours ago - Stars: 431 - Forks: 35

HazyResearch/fonduer

A knowledge base construction engine for richly formatted data

Language: Python - Size: 11.5 MB - Last synced at: 3 days ago - Pushed at: almost 4 years ago - Stars: 410 - Forks: 77

lium-lst/nmtpytorch 📦

Sequence-to-Sequence Framework in PyTorch

Language: Jupyter Notebook - Size: 7.49 MB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 391 - Forks: 51

kyegomez/Med-PaLM

Towards Generalist Biomedical AI

Language: Python - Size: 850 KB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 381 - Forks: 53

kyegomez/CM3Leon

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

Language: Python - Size: 754 KB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 360 - Forks: 18

OmicsML/dance

DANCE: a deep learning library and benchmark platform for single-cell analysis

Language: Python - Size: 17.3 MB - Last synced at: 44 minutes ago - Pushed at: about 2 hours ago - Stars: 359 - Forks: 38

microsoft/UniVL

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

Language: Python - Size: 219 KB - Last synced at: 5 days ago - Pushed at: 10 months ago - Stars: 354 - Forks: 56

soujanyaporia/multimodal-sentiment-analysis

Attention-based multimodal fusion for sentiment analysis

Language: Python - Size: 87.3 MB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 351 - Forks: 74

Yutong-Zhou-cv/Awesome-Multimodality

A Survey on multimodal learning research.

Size: 1.76 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 324 - Forks: 22

kyegomez/NaViT

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

Language: Python - Size: 210 KB - Last synced at: 4 days ago - Pushed at: about 2 months ago - Stars: 231 - Forks: 11

Liang-ZX/VectorNet

Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”

Language: Jupyter Notebook - Size: 174 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 200 - Forks: 43

srvk/how2-dataset

This repository contains code and metadata of How2 dataset

Language: Python - Size: 24.4 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 172 - Forks: 18

FoundationVision/GenerateU

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

Language: Python - Size: 14.4 MB - Last synced at: 2 days ago - Pushed at: about 2 months ago - Stars: 168 - Forks: 7

BiomedSciAI/fuse-med-ml

A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)

Language: Python - Size: 104 MB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 146 - Forks: 36

kyegomez/PALI3

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

Language: Python - Size: 2.61 MB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 146 - Forks: 4

florencejt/fusilli

A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸

Language: Python - Size: 987 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 146 - Forks: 12

kyegomez/swarms-pytorch

Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊

Language: Python - Size: 58.2 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 122 - Forks: 10

senwu/emmental

A deep learning framework for building multimodal multi-task learning systems.

Language: Python - Size: 891 KB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 110 - Forks: 18

kyegomez/PALI

Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"

Language: Python - Size: 624 KB - Last synced at: 15 days ago - Pushed at: about 1 year ago - Stars: 89 - Forks: 8

lucidrains/mirasol-pytorch

Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch

Language: Python - Size: 1.01 MB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 88 - Forks: 2

MMStar-Benchmark/MMStar

This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

Language: Python - Size: 3.41 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 84 - Forks: 1

ForestsKing/Awesome-Multimodal-Time-Series

A curated list of paper, code, data, and other resources focus on multimodal time series analysis.

Size: 9.77 KB - Last synced at: 13 days ago - Pushed at: 20 days ago - Stars: 71 - Forks: 4

akashe/Multimodal-action-recognition

Code on selecting an action based on multimodal inputs. Here in this case inputs are voice and text.

Language: Python - Size: 64.7 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 69 - Forks: 11

ForestsKing/ChatTime

PyTorch implementation of "ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data" (AAAI 2025 [oral])

Language: Jupyter Notebook - Size: 2.07 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 65 - Forks: 10

songqiang321/Awesome-AI-Papers

This repository is used to collect papers and code in the field of AI.

Size: 4.08 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 60 - Forks: 6

mims-harvard/Clinical-knowledge-embeddings

Unified Clinical Vocabulary Embeddings for Advancing Precision Medicine

Language: Jupyter Notebook - Size: 16.4 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 53 - Forks: 5

firojalam/multimodal_social_media

multimodal social media content (text, image) classification

Language: Python - Size: 3.54 MB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 50 - Forks: 14

amazon-science/gluonmm

A library of transformer models for computer vision and multi-modality research

Language: Python - Size: 65.4 KB - Last synced at: 17 days ago - Pushed at: over 3 years ago - Stars: 49 - Forks: 2

firojalam/harmful-memes-detection-resources

Resources (conference/journal publications, references to dataset) for harmful memes detection.

Language: TeX - Size: 3.85 MB - Last synced at: about 17 hours ago - Pushed at: about 3 years ago - Stars: 47 - Forks: 5

YeonwooSung/LIMoE-pytorch

PyTorch implementation of LIMoE

Language: Python - Size: 4.3 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 43 - Forks: 1

kyegomez/EXA-1 Fork of pliang279/awesome-multimodal-ml

An EXA-Scale repository of Multi-Modality AI resources from papers and models, to foundational libraries!

Language: Jupyter Notebook - Size: 1.15 GB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 42 - Forks: 2

Luka0612/ChineseVLBert

中文领域的多模态Bert

Size: 1.95 KB - Last synced at: almost 2 years ago - Pushed at: about 5 years ago - Stars: 42 - Forks: 5

kunzhan/MVGL

TCyb 2018: Graph learning for multiview clustering

Language: Matlab - Size: 217 KB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 39 - Forks: 12

UKPLab/5pils

Code associated with the EMNLP 2024 Main paper: "Image, tell me your story!" Predicting the original meta-context of visual misinformation.

Language: Python - Size: 3.38 MB - Last synced at: 30 days ago - Pushed at: about 1 month ago - Stars: 38 - Forks: 4

chalk-lab/MCMCTempering.jl

Implementations of parallel tempering algorithms to augment samplers with tempering capabilities

Language: Julia - Size: 565 KB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 37 - Forks: 5

piomin/spring-ai-showcase

Sample Spring AI Application with several use cases

Language: Java - Size: 3.94 MB - Last synced at: 1 day ago - Pushed at: 7 days ago - Stars: 32 - Forks: 16

trislett/TFCE_mediation

Fast regression and mediation analysis of vertex or voxel MRI data with TFCE

Language: Python - Size: 110 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 30 - Forks: 9

MileBench/MileBench

This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"

Language: Python - Size: 3.52 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 29 - Forks: 1

xability/maidr-legacy

[DEPRECATED prototype] Multimodal Access and Interactive Data Representation

Language: HTML - Size: 9.5 MB - Last synced at: 3 days ago - Pushed at: 19 days ago - Stars: 28 - Forks: 5

TheChymera/behaviopy

Behavioral data analysis and plotting in Python.

Language: Python - Size: 144 KB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 26 - Forks: 14

awslabs/guidance-for-multi-omics-and-multi-modal-data-integration-and-analysis-on-aws

This guidance creates a scalable environment in AWS to prepare genomic, clinical, mutation, expression and imaging data for large-scale analysis and perform interactive queries against a data lake. The solution also demonstrates the use of Amazon Omics for multi-modal analysis.

Language: Jupyter Notebook - Size: 179 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 24 - Forks: 8

xf-zhao/Matcha-agent

Official implementation of Matcha-agent, https://arxiv.org/abs/2303.08268

Language: Python - Size: 22.8 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 22 - Forks: 2

aws-samples/deploy-stable-diffusion-model-on-amazon-sagemaker-endpoint

Deploy Stable Diffusion Model on Amazon SageMaker Endpont

Language: Jupyter Notebook - Size: 11.9 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 20 - Forks: 6

multimodal-ai-lab/DEFAME

Fact-checking system for textual and visual inputs.

Language: Python - Size: 29.7 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 19 - Forks: 2

prml615/prml

Multimodal Fully Convolutional Neural networks for Semantic Segmentation.

Language: Python - Size: 1.62 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 19 - Forks: 10

rezacsedu/Multimodal-autoencoder-for-breast-cancer

Prognostically Relevant Subtypes and Survival Prediction for Breast Cancer Based on Multimodal Genomics Data

Language: Python - Size: 23.3 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 19 - Forks: 9

SiyuanYan1/PanDerm

PanDerm: A General-Purpose Multimodal Foundation Model for Dermatology

Language: Jupyter Notebook - Size: 3 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 18 - Forks: 1

dicomtools/TriDFusion

TriDFusion (3DF) Medical Imaging Viewer

Language: MATLAB - Size: 18.1 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 18 - Forks: 2

FuxiaoLiu/DocumentCLIP

[ICPRAI 2024] DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents

Language: Python - Size: 2.49 MB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 0

AmbiTyga/MemSem

A Multi-modal Framework for Sentimental Analysis of Meme

Language: Python - Size: 4.59 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 16 - Forks: 5

ldeecke/mn-torch

Mode normalization (ICLR 2019).

Language: Python - Size: 9.77 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 16 - Forks: 1

ahq1993/Multimodal-Deep-Q-Network-for-Social-Human-Robot-Interaction

Multimodal Deep Q-Network (MDQN) for modelling human-like social intelligence.

Language: Lua - Size: 1.74 MB - Last synced at: about 1 year ago - Pushed at: about 8 years ago - Stars: 14 - Forks: 10

MIMBCD-UI/meta

:paperclip: About MIMBCD-UI Project

Size: 1.04 GB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 12 - Forks: 4

kyegomez/MMCA

The open source community's implementation of the all-new Multi-Modal Causal Attention from "DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention"

Language: Python - Size: 230 KB - Last synced at: 13 days ago - Pushed at: about 1 year ago - Stars: 12 - Forks: 0

Agora-X/DailyPaperClub

The repository for the exclusive Daily Paper Club hosted at Agora every 10pm NYC time at this discord: https://discord.gg/Gnzh6dnzyz

Size: 14.6 KB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 0

declare-lab/Sealing

[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

Language: Python - Size: 8.92 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 11 - Forks: 3

thiippal/AI2D-RST

A repository for the AI2D-RST corpus.

Language: Python - Size: 18.2 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 11 - Forks: 3

ChongKaKam/TAMA

Code for TAMA: See it, Think it, Sorted: Large Multimodal Models are Few-shot Time Series Anomaly Analyzers.

Language: Python - Size: 4.82 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 10 - Forks: 1

tianleimin/ACL2018-MultimodalMultitaskSentimentAnalysis

Codes for ACL2018 Multimodal Language Workshop paper

Language: Python - Size: 234 KB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 10 - Forks: 1

OlehOnyshchak/pyWikiMM

Collects a multimodal dataset of Wikipedia articles and their images

Language: Python - Size: 7.78 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 1

kyegomez/swarmalators

Pytorch Implementation of the Swarmalators algorithm from "Exotic swarming dynamics of high-dimensional swarmalators"

Language: Python - Size: 2.16 MB - Last synced at: 13 days ago - Pushed at: 6 months ago - Stars: 8 - Forks: 0

kyegomez/Gen2

Implementation of "Text driven video generation" in pytorch

Language: Python - Size: 222 KB - Last synced at: 13 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

fuyahuii/ConSK-GCN

The PyTorch code for paper: "CONSK-GCN: Conversational Semantic- and Knowledge-Oriented Graph Convolutional Network for Multimodal Emotion Recognition."

Language: Python - Size: 117 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 2

kyegomez/ConvNet

Implementation of the NFNets from the paper: "ConvNets Match Vision Transformers at Scale" by Google Research

Language: Python - Size: 2.16 MB - Last synced at: 13 days ago - Pushed at: 8 months ago - Stars: 7 - Forks: 0

kyegomez/CELESTIAL-1

Omni-Modality Processing, Understanding, and Generation

Language: Python - Size: 2.49 MB - Last synced at: 13 days ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 1

cleopatra-itn/fair_multimodal_sentiment

Code and Splits for the paper "A Fair and Comprehensive Comparison of Multimodal Tweet Sentiment Analysis Methods", In Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia Understanding (MMPT ’21), August 21, 2021,Taipei, Taiwan

Language: Python - Size: 628 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 2

gullalc/multimodal_r1_papers

Deepseek RL (GRPO)-Inspired Research for Vision & Multimodal Reasoning

Size: 37.1 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 6 - Forks: 1

mjunaidca/upwork-leads-gpt

Upwork Leads GPT is an AI-powered Job Finder tool for freelancers. It's built using OpenAI’s CustomGPT. It searches for the most relevant job postings based on provided keywords and capable to generate proposals.

Language: Python - Size: 207 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 6 - Forks: 0

kyegomez/MMCA-MGQA

Experiments around using Multi-Modal Casual Attention with Multi-Grouped Query Attention

Language: Python - Size: 210 KB - Last synced at: 13 days ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

soraxas/Occ-Traj120

A trajectories dataset with associated occupancy maps

Size: 14.5 MB - Last synced at: 8 days ago - Pushed at: almost 5 years ago - Stars: 6 - Forks: 0

helenetran3/MER-Databases-and-Emotion-Ambiguity

The most popular databases used in multimodal emotion recognition with a focus on the representation of emotion ambiguity.

Size: 252 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 5 - Forks: 1

Droliven/diverse_sampling

Official project of DiverseSampling (ACMMM2022 Paper)

Language: Python - Size: 98 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

Warvito/integrating-multi-modal-neuroimaging

Integrating machining learning and multi-modal neuroimaging to detect schizophrenia at the level of the individual

Language: Python - Size: 65.4 KB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 2

Spider101/Visual-Semantic-Alignments

An exploration into the possibility of generating multi-sentence image descriptions by leveraging the latent dependencies between visual concepts in an image with their textual counterparts

Language: Python - Size: 149 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 2

cleopatra-itn/image_text_claim_detection

Code and Dataset for paper "On the Role of Images for Analyzing Claims in Social Media" @2nd International Workshop on Cross-lingual Event-centric Open Analytics (CLEOPATRA) co-located with The Web Conf 2021

Language: Python - Size: 104 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 1

gangeshwark/multimodal_feature_extractors

[IN PROGRESS] Multimodal feature extraction modules for ease of doing research and reproducibility.

Language: Python - Size: 45.9 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 4 - Forks: 1

e1four15f/ClipSeek

A Text-to-Clip Retrieval System

Language: Python - Size: 114 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 0

shahariar-shibli/Adversarial-Attack-on-POS-Tags

Adversarial Attacks on Parts of Speech: An Empirical Study in Text-to-Image Generation

Language: Jupyter Notebook - Size: 101 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

peterlipan/FoF

The official implementations of our BIBM'24 paper: Focus on Focus: Focus-oriented Representation Learning and Multi-view Cross-modal Alignment for Glioma Grading

Language: Python - Size: 22.9 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

Clealiya/Multimodal-model

[FR|EN - Trio] 2023 - 2024 Centrale Méditerranée AI Master | Multimodal retranscription with text, audio and video

Language: Python - Size: 15.5 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

diaoenmao/Multimodal-Controller-for-Generative-Models

[CVMI 2022] Multimodal Controller for Generative Models

Language: Python - Size: 282 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 1

ArashVahabpour/SOG

Self-Organizing Generator

Language: Jupyter Notebook - Size: 82.9 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

QJYBall/MyoPS-Net

MyoPS-Net: Myocardial Pathology Segmentation with Flexible Combination of Multi-Sequence CMR images

Language: Python - Size: 614 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 2

sutdcv/multi-modal-video-reasoning

[ICCV2021 Workshop] Multi-Modal Video Reasoning and Analyzing Competition

Language: JavaScript - Size: 8.77 MB - Last synced at: 10 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 1

vita-epfl/AdversarialLoss-SGAN Fork of agrimgupta92/sgan

Analysing Adversarial Loss of Social GAN

Language: Python - Size: 378 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 1

MichiganNLP/deceptiondetection

Deception Detection project website

Language: JavaScript - Size: 8.99 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

Related Topics
multimodal 31 deep-learning 23 multimodal-deep-learning 22 machine-learning 17 llm 15 multimodal-learning 14 artificial-intelligence 14 large-language-models 12 gpt4 11 pytorch 10 multimodal-large-language-models 9 ai 8 natural-language-processing 8 attention-mechanism 8 nlp 7 computer-vision 7 clip 7 llms 6 attention-is-all-you-need 6 attention 6 dataset 5 segmentation 5 convolutional-neural-networks 5 mllm 4 medical-imaging 4 rag 4 image 3 alignment 3 python 3 vision-language-model 3 trajectory-prediction 3 keras 3 deep-neural-networks 3 foundation-models 3 video 3 vision 3 reinforcement-learning 3 diagrams 3 multimodal-data 3 aigc 3 visual-question-answering 3 openai 3 evaluation 3 text-to-image 3 cv 3 variational-autoencoder 3 gcn 3 large-multimodal-models 3 perception 2 generative-ai 2 video-text-retrieval 2 multimodal-fusion 2 radiomics 2 fact-checking 2 large-vision-language-models 2 workshop 2 statistics 2 neural-network 2 swarm-intelligence 2 lvlm 2 msrvtt 2 swarm-robotics 2 swarms 2 data 2 pinecone 2 image-generation 2 time-series 2 planning 2 multimodal-time-series 2 robotics 2 contrastive-learning 2 agent 2 bert 2 fake-news 2 tensorflow 2 transformers 2 medical-ai 2 text 2 visualization 2 sentiment-classification 2 sentiment-analysis 2 social-media 2 asr 2 benchmark 2 cnn 2 multimodal-sentiment-analysis 2 imaging 2 speech-recognition 2 hinge-loss 2 human-motion-prediction 2 spring-boot 2 spring-ai 2 likelihood 2 stablediffusion 2 openai-clip 2 manifold 2 data-science 2 sampling 2 stochastic 2 variational-inference 2