An open API service providing repository metadata for many open source software ecosystems.

Topic: "multimodal-large-language-models"

X-iZhang/RRG-BioNLP-ACL2024

[BioNLP ACL'24] 🔬 Med-CXRGen, developed by Glasgow AI4BioMed Lab, brings vision-language adaptation to biomedical radiology via visual instruction tuning.

Language: Python - Size: 596 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 1

esborisova/TableEval-Study

Table Understanding and (Multimodal) LLMs: A Cross-Domain Case Study on Scientific vs. Non-Scientific Data

Language: Python - Size: 70.6 MB - Last synced at: 13 days ago - Pushed at: 23 days ago - Stars: 1 - Forks: 0

natgluons/AI-docs-analyzer-API

Automate invoice analysis and identity verification, built with an open-source multimodal LLM and OCR (DocTR/TrOCR), using FastAPI, Supabase, PgVector, and Neo4j.

Language: Python - Size: 8.79 KB - Last synced at: 2 days ago - Pushed at: 27 days ago - Stars: 1 - Forks: 0

mediacontentatlas/mediacontentatlas

Code for Media Content Atlas

Language: Python - Size: 1.45 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

CoolGuy2982/Eco

A Multimodal AI app that gives you eco friendly insights with just a picture. It can understand what you want to know just by looking at the picture, offering recycling advice locations and alternative products, helps subvert greenwashing, and much much more.

Language: HTML - Size: 34.4 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

ChocoWu/Any2Caption

This is the project webpage for 'Any2Caption'.

Size: 4.51 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

ogunerkutay/huggingface-llm-examples

A collection of scripts for running various large language models, checking hardware compatibility, and measuring performance metrics. It includes implementations for GPT-2, BERT, LLaMA, BLIP-2, and more, leveraging Hugging Face Transformers and PyTorch. The project is designed to experiment with different models for NLP and multimodal tasks.

Language: Python - Size: 58.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

ALucek/multimodal-llm-breakdown

Outlining and demonstrating how language models are able to understand image, video, and text content.

Language: Jupyter Notebook - Size: 14.7 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

aiden200/VLM_Implementation

Implementing a Video Language Model from scratch

Language: Python - Size: 4.21 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

ideavision/llm-development

"🚀 A job-ready, hands-on repository for practical LLM development! Master prompt engineering, fine-tuning, retrieval-augmented generation (RAG), and more with real-world examples and best practices. Perfect for AI engineers looking to build and deploy powerful language models."

Language: Jupyter Notebook - Size: 27.4 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

sitamgithub-MSIT/well-being

Reducing neonatal and under-5 mortality rates via an AI-driven awareness platform with a Gradio app, Gemini API integration, and essential project utilities. #AIForGood

Language: Python - Size: 487 KB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 1 - Forks: 1

adithya-s-k/eagle

A framework streamlining Training, Finetuning, Evaluation and Deployment of Multi Modal Language models

Language: Jupyter Notebook - Size: 52.7 KB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

philbertmukunzi/OmniSage

OmniSage: AI-Powered Discord Bot. OmniSage is a versatile Discord bot that leverages Large Language Model (LLMs) to generate intelligent responses, join voice channels, provide text-to-speech functionality, and includes an interactive, AI-powered trivia game. It's designed to be your all-knowing companion in Discord servers.

Language: Python - Size: 50.8 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 1

LoupXpro/AlphaExtract

AlphaExtract is a sophisticated PDF summarization tool that combines cutting-edge AI technology with efficient document processing. The project is built using Python and leverages Meta's LLaMA 4 MOE Maverick model along with Groq's inference engine to provide fast and accurate PDF summaries.

Language: Python - Size: 12.6 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 1

rivi89/Awesome-spatial-visual-reasoning-MLLMs

Language: Python - Size: 3.31 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

Sandia7171717171/CharmBench

CharmBench offers a challenging benchmark for large vision-language models, providing datasets and evaluation tools to enhance multimodal reasoning. Check out our latest updates and contribute to the project by starring the repo! 🌟👩💻

Language: Jupyter Notebook - Size: 7.23 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

saky-semicolon/Multimodal-Readmission-Prediction

Multimodal fusion model for predicting 30-day hospital readmission using structured EHR data and BERT-based clinical text embeddings from the MIMIC-III dataset.

Language: Jupyter Notebook - Size: 1.69 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

alianoroozi/ai-hub

A collection of AI experiments, including model training, ML system development, and end-to-end pipelines.

Language: Jupyter Notebook - Size: 43.4 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

JoeJoe1313/PaliGemma-Image-Segmentation

An app with FastAPI, Docker, transformers, JAX/Flax for performing image segmentation with PaliGemma 2 mix

Language: Jupyter Notebook - Size: 8.78 MB - Last synced at: 21 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

Fashad-Ahmed/exploring-google-gemini-2.5

Explored & tailer down the google gemini 2.5 flash model and it's variants

Language: Jupyter Notebook - Size: 6.98 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

pritamqu/VCRBench

VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models

Language: Python - Size: 1.14 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

RauhanAhmed/AlphaExtract

AlphaExtract is a sophisticated PDF summarization tool that combines cutting-edge AI technology with efficient document processing. The project is built using Python and leverages Meta's LLaMA 4 MOE Maverick model along with Groq's inference engine to provide fast and accurate PDF summaries.

Language: Python - Size: 5.58 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

PE51K/spbu-diploma

MLLM application to Chinese speech practice as my SPBU diploma project

Language: Jupyter Notebook - Size: 66.4 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

khoi03/Multimodal-ChatBot

A chatbot can process and analyze various forms of media including text, images, videos, and other data types.

Language: Python - Size: 2.94 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

leoli51/youtube-conspiracy-detection

Code for the paper "Evaluating AI capabilities in detecting conspiracy theories on YouTube".

Language: Jupyter Notebook - Size: 12.9 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Pavansomisetty21/Multimodal-AI-Agent-for-Video-Understanding-and-Research-using-Gemini-LLM

In this we implement Multimodal AI Agent for Video Understanding and Research we can ask any questions on video it will answer to it

Language: Jupyter Notebook - Size: 4.21 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

abhi227070/Advanced-Dish-Detection-using-AI

DishVision AI is a multimodal food recognition app powered by Google Gemini AI and Streamlit. Upload or capture a dish image, and the AI will detect its name, ingredients, and recipe instantly! 🚀🔥

Language: Python - Size: 1.34 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

imane0x/PerfectFit

PerfectFit is an AI-powered shopping assistant that uses multimodal search to quickly find ideal product matches based on text or image inputs, streamlining the online shopping experience.

Language: JavaScript - Size: 12.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

UlianaDzhumok/deepseek_janus_pro_experiments

Sample project of multimodal decision and image generation with DeepSeek Janus Pro 7B with Real-ESRGAN upscaling

Language: Jupyter Notebook - Size: 2.36 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

PrachiPatel15/Multimodal-Visual-AI-Chatbot

A powerful Streamlit application that analyzes images using multiple vision models and responds to queries about visual content through conversational AI.

Language: Python - Size: 664 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

andre-pereira/ICMI2024LLMsEnjoymentDetection

This repository contains the code, dataset, and model outputs for the ICMI 2024 paper Multimodal User Enjoyment Detection in Human-Robot Conversation: The Power of Large Language Models. It includes scripts for prompting LLMs, training supervised models, and evaluating multimodal enjoyment detection.

Language: Python - Size: 152 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Cloudon-0216/VLArena Fork of hzjian123/VLArena

VLArena: Integrating End-to-End Multimodal Models with Closed-loop Generative Simulation for Autonomous Driving.

Language: Python - Size: 364 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

pritamqu/HALVA

[ICLR 2025] Data-Augmented Phrase-Level Alignment for Mitigating Object Hallucination

Language: Python - Size: 16.1 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

adv-11/RT_MM_AI

Research and dev into the Multimodal LLMs, and utilizing them to create Real time interaction applications.

Language: Jupyter Notebook - Size: 12.4 MB - Last synced at: 19 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

surakku/cadence-gemma

Giving RecurrentGemma sight.

Language: Python - Size: 3.12 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

n30tri8/better-VLM-benchmark

more accurate benchmarking of VLM

Language: Python - Size: 254 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 1

Himank-Khatri/Agentic-Financial-AI

A multi-agent system powered by the phi framework, integrating web search and financial analysis capabilities.

Language: Python - Size: 8.79 KB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

scofield7419/MUIE-REAMO

Code of the Grounded MUIE model, REAMO

Language: Python - Size: 127 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

CKeibel/FHSWF-deep-learning

Multimodal RAG and comparisons between language models. (Project for Deep Learning Module at the FHSWF)

Language: Jupyter Notebook - Size: 6.73 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 1

antonio-f/Florence-2-test

Florence-2 quick test

Language: Jupyter Notebook - Size: 3.91 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

KayvanShah1/VirtuTA

VirtuTA is an AI teaching assistant that delivers quick, accurate responses to student queries directly on Piazza. Powered by agentic workflows, Google Gemini, and Langchain, it automates both conceptual and logistical course queries.

Language: Jupyter Notebook - Size: 31.8 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

sitamgithub-MSIT/TechSage

Language: Python - Size: 256 KB - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

DistilledCode/mmrl

Multi-Modal Representational Learning for Social Media Popularity Prediction

Language: Python - Size: 27.3 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

nagababumo/Building-Applications-with-Vector-Databases

Language: Jupyter Notebook - Size: 619 KB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

nicolay-r/Awesome-Image-Captioning-MLLMs

A curated list of awesome Image captioning strudies, aimed at annotating and reporting CT / MRI scans

Size: 5.86 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Related Topics
large-language-models 72 multimodal 44 llm 40 vision-language-model 30 mllm 24 deep-learning 18 large-multimodal-models 18 machine-learning 17 artificial-intelligence 15 multimodal-learning 15 vlm 15 multimodal-deep-learning 14 llms 14 benchmark 14 chatbot 12 large-vision-language-models 12 llava 12 generative-ai 12 natural-language-processing 12 reasoning 10 multimodality 9 foundation-models 9 instruction-tuning 8 video 8 visual-question-answering 8 ai 8 large-language-model 8 computer-vision 8 multimodal-data 7 video-understanding 7 vision-language 7 retrieval-augmented-generation 7 transformers 7 llama 7 dataset 6 python 6 rag 6 awesome-list 6 hallucination 6 vision-language-models 6 streamlit 5 medical-image-analysis 5 reinforcement-learning 5 chatgpt 5 vision-transformer 5 qwen 5 visual-instruction-tuning 5 agentic-ai 5 mixture-of-experts 4 video-language-model 4 hallucination-detection 4 knowledge-graph 4 multi-modality 4 llms-benchmarking 4 text-to-image-generation 4 segmentation 4 instruction-following 4 llama3 4 huggingface-transformers 4 huggingface 4 chest-xrays 4 radiology-report-generation 4 in-context-learning 4 clip 4 video-question-answering 4 vision-and-language 4 hallucination-mitigation 4 pytorch 4 nlp 4 docker 4 long-video-understanding 4 gemini-pro 4 question-answering 4 gpt-4 3 safety 3 fact-checking 3 chain-of-thought 3 speech-language-model 3 fine-tuning 3 speech 3 ai-agents 3 evaluation 3 code-generation 3 alignment 3 pinecone 3 neurips-2024 3 large-vision-language-model 3 vision-language-transformer 3 gemini-api 3 gradio 3 deepseek-r1 3 gpt 3 mllm-reasoning 3 agent 3 text-to-speech 3 python3 3 vision-language-learning 3 reasoning-language-models 3 aigc 3 generation 3