Topic: "multimodal-large-language-models"
X-iZhang/RRG-BioNLP-ACL2024
[BioNLP ACL'24] 🔬 Med-CXRGen, developed by Glasgow AI4BioMed Lab, brings vision-language adaptation to biomedical radiology via visual instruction tuning.
Language: Python - Size: 596 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 1

esborisova/TableEval-Study
Table Understanding and (Multimodal) LLMs: A Cross-Domain Case Study on Scientific vs. Non-Scientific Data
Language: Python - Size: 70.6 MB - Last synced at: 13 days ago - Pushed at: 23 days ago - Stars: 1 - Forks: 0

natgluons/AI-docs-analyzer-API
Automate invoice analysis and identity verification, built with an open-source multimodal LLM and OCR (DocTR/TrOCR), using FastAPI, Supabase, PgVector, and Neo4j.
Language: Python - Size: 8.79 KB - Last synced at: 2 days ago - Pushed at: 27 days ago - Stars: 1 - Forks: 0

mediacontentatlas/mediacontentatlas
Code for Media Content Atlas
Language: Python - Size: 1.45 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

CoolGuy2982/Eco
A Multimodal AI app that gives you eco friendly insights with just a picture. It can understand what you want to know just by looking at the picture, offering recycling advice locations and alternative products, helps subvert greenwashing, and much much more.
Language: HTML - Size: 34.4 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

ChocoWu/Any2Caption
This is the project webpage for 'Any2Caption'.
Size: 4.51 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

ogunerkutay/huggingface-llm-examples
A collection of scripts for running various large language models, checking hardware compatibility, and measuring performance metrics. It includes implementations for GPT-2, BERT, LLaMA, BLIP-2, and more, leveraging Hugging Face Transformers and PyTorch. The project is designed to experiment with different models for NLP and multimodal tasks.
Language: Python - Size: 58.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

ALucek/multimodal-llm-breakdown
Outlining and demonstrating how language models are able to understand image, video, and text content.
Language: Jupyter Notebook - Size: 14.7 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

aiden200/VLM_Implementation
Implementing a Video Language Model from scratch
Language: Python - Size: 4.21 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

ideavision/llm-development
"🚀 A job-ready, hands-on repository for practical LLM development! Master prompt engineering, fine-tuning, retrieval-augmented generation (RAG), and more with real-world examples and best practices. Perfect for AI engineers looking to build and deploy powerful language models."
Language: Jupyter Notebook - Size: 27.4 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

sitamgithub-MSIT/well-being
Reducing neonatal and under-5 mortality rates via an AI-driven awareness platform with a Gradio app, Gemini API integration, and essential project utilities. #AIForGood
Language: Python - Size: 487 KB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 1 - Forks: 1

adithya-s-k/eagle
A framework streamlining Training, Finetuning, Evaluation and Deployment of Multi Modal Language models
Language: Jupyter Notebook - Size: 52.7 KB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

philbertmukunzi/OmniSage
OmniSage: AI-Powered Discord Bot. OmniSage is a versatile Discord bot that leverages Large Language Model (LLMs) to generate intelligent responses, join voice channels, provide text-to-speech functionality, and includes an interactive, AI-powered trivia game. It's designed to be your all-knowing companion in Discord servers.
Language: Python - Size: 50.8 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 1

LoupXpro/AlphaExtract
AlphaExtract is a sophisticated PDF summarization tool that combines cutting-edge AI technology with efficient document processing. The project is built using Python and leverages Meta's LLaMA 4 MOE Maverick model along with Groq's inference engine to provide fast and accurate PDF summaries.
Language: Python - Size: 12.6 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 1

rivi89/Awesome-spatial-visual-reasoning-MLLMs
Language: Python - Size: 3.31 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

Sandia7171717171/CharmBench
CharmBench offers a challenging benchmark for large vision-language models, providing datasets and evaluation tools to enhance multimodal reasoning. Check out our latest updates and contribute to the project by starring the repo! 🌟👩💻
Language: Jupyter Notebook - Size: 7.23 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

saky-semicolon/Multimodal-Readmission-Prediction
Multimodal fusion model for predicting 30-day hospital readmission using structured EHR data and BERT-based clinical text embeddings from the MIMIC-III dataset.
Language: Jupyter Notebook - Size: 1.69 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

alianoroozi/ai-hub
A collection of AI experiments, including model training, ML system development, and end-to-end pipelines.
Language: Jupyter Notebook - Size: 43.4 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

JoeJoe1313/PaliGemma-Image-Segmentation
An app with FastAPI, Docker, transformers, JAX/Flax for performing image segmentation with PaliGemma 2 mix
Language: Jupyter Notebook - Size: 8.78 MB - Last synced at: 21 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

Fashad-Ahmed/exploring-google-gemini-2.5
Explored & tailer down the google gemini 2.5 flash model and it's variants
Language: Jupyter Notebook - Size: 6.98 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

pritamqu/VCRBench
VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models
Language: Python - Size: 1.14 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

RauhanAhmed/AlphaExtract
AlphaExtract is a sophisticated PDF summarization tool that combines cutting-edge AI technology with efficient document processing. The project is built using Python and leverages Meta's LLaMA 4 MOE Maverick model along with Groq's inference engine to provide fast and accurate PDF summaries.
Language: Python - Size: 5.58 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

PE51K/spbu-diploma
MLLM application to Chinese speech practice as my SPBU diploma project
Language: Jupyter Notebook - Size: 66.4 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

khoi03/Multimodal-ChatBot
A chatbot can process and analyze various forms of media including text, images, videos, and other data types.
Language: Python - Size: 2.94 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

leoli51/youtube-conspiracy-detection
Code for the paper "Evaluating AI capabilities in detecting conspiracy theories on YouTube".
Language: Jupyter Notebook - Size: 12.9 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Pavansomisetty21/Multimodal-AI-Agent-for-Video-Understanding-and-Research-using-Gemini-LLM
In this we implement Multimodal AI Agent for Video Understanding and Research we can ask any questions on video it will answer to it
Language: Jupyter Notebook - Size: 4.21 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

abhi227070/Advanced-Dish-Detection-using-AI
DishVision AI is a multimodal food recognition app powered by Google Gemini AI and Streamlit. Upload or capture a dish image, and the AI will detect its name, ingredients, and recipe instantly! 🚀🔥
Language: Python - Size: 1.34 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

imane0x/PerfectFit
PerfectFit is an AI-powered shopping assistant that uses multimodal search to quickly find ideal product matches based on text or image inputs, streamlining the online shopping experience.
Language: JavaScript - Size: 12.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

UlianaDzhumok/deepseek_janus_pro_experiments
Sample project of multimodal decision and image generation with DeepSeek Janus Pro 7B with Real-ESRGAN upscaling
Language: Jupyter Notebook - Size: 2.36 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

PrachiPatel15/Multimodal-Visual-AI-Chatbot
A powerful Streamlit application that analyzes images using multiple vision models and responds to queries about visual content through conversational AI.
Language: Python - Size: 664 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

andre-pereira/ICMI2024LLMsEnjoymentDetection
This repository contains the code, dataset, and model outputs for the ICMI 2024 paper Multimodal User Enjoyment Detection in Human-Robot Conversation: The Power of Large Language Models. It includes scripts for prompting LLMs, training supervised models, and evaluating multimodal enjoyment detection.
Language: Python - Size: 152 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Cloudon-0216/VLArena Fork of hzjian123/VLArena
VLArena: Integrating End-to-End Multimodal Models with Closed-loop Generative Simulation for Autonomous Driving.
Language: Python - Size: 364 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

pritamqu/HALVA
[ICLR 2025] Data-Augmented Phrase-Level Alignment for Mitigating Object Hallucination
Language: Python - Size: 16.1 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

adv-11/RT_MM_AI
Research and dev into the Multimodal LLMs, and utilizing them to create Real time interaction applications.
Language: Jupyter Notebook - Size: 12.4 MB - Last synced at: 19 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

surakku/cadence-gemma
Giving RecurrentGemma sight.
Language: Python - Size: 3.12 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

n30tri8/better-VLM-benchmark
more accurate benchmarking of VLM
Language: Python - Size: 254 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 1

Himank-Khatri/Agentic-Financial-AI
A multi-agent system powered by the phi framework, integrating web search and financial analysis capabilities.
Language: Python - Size: 8.79 KB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

scofield7419/MUIE-REAMO
Code of the Grounded MUIE model, REAMO
Language: Python - Size: 127 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

CKeibel/FHSWF-deep-learning
Multimodal RAG and comparisons between language models. (Project for Deep Learning Module at the FHSWF)
Language: Jupyter Notebook - Size: 6.73 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 1

antonio-f/Florence-2-test
Florence-2 quick test
Language: Jupyter Notebook - Size: 3.91 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

KayvanShah1/VirtuTA
VirtuTA is an AI teaching assistant that delivers quick, accurate responses to student queries directly on Piazza. Powered by agentic workflows, Google Gemini, and Langchain, it automates both conceptual and logistical course queries.
Language: Jupyter Notebook - Size: 31.8 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

sitamgithub-MSIT/TechSage
Language: Python - Size: 256 KB - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

DistilledCode/mmrl
Multi-Modal Representational Learning for Social Media Popularity Prediction
Language: Python - Size: 27.3 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

nagababumo/Building-Applications-with-Vector-Databases
Language: Jupyter Notebook - Size: 619 KB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

nicolay-r/Awesome-Image-Captioning-MLLMs
A curated list of awesome Image captioning strudies, aimed at annotating and reporting CT / MRI scans
Size: 5.86 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0
