multimodal-large-language-models | Topic

Topic: "multimodal-large-language-models"

X-iZhang/RRG-BioNLP-ACL2024

[BioNLP ACL'24] 🔬 Med-CXRGen, developed by Glasgow AI4BioMed Lab, brings vision-language adaptation to biomedical radiology via visual instruction tuning.

Language: Python - Size: 596 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 1

esborisova/TableEval-Study

Table Understanding and (Multimodal) LLMs: A Cross-Domain Case Study on Scientific vs. Non-Scientific Data

Language: Python - Size: 70.6 MB - Last synced at: 13 days ago - Pushed at: 23 days ago - Stars: 1 - Forks: 0

natgluons/AI-docs-analyzer-API

Automate invoice analysis and identity verification, built with an open-source multimodal LLM and OCR (DocTR/TrOCR), using FastAPI, Supabase, PgVector, and Neo4j.

Language: Python - Size: 8.79 KB - Last synced at: 2 days ago - Pushed at: 27 days ago - Stars: 1 - Forks: 0

mediacontentatlas/mediacontentatlas

Code for Media Content Atlas

Language: Python - Size: 1.45 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

A Multimodal AI app that gives you eco friendly insights with just a picture. It can understand what you want to know just by looking at the picture, offering recycling advice locations and alternative products, helps subvert greenwashing, and much much more.

Language: HTML - Size: 34.4 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

ChocoWu/Any2Caption

This is the project webpage for 'Any2Caption'.

Size: 4.51 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

ogunerkutay/huggingface-llm-examples

A collection of scripts for running various large language models, checking hardware compatibility, and measuring performance metrics. It includes implementations for GPT-2, BERT, LLaMA, BLIP-2, and more, leveraging Hugging Face Transformers and PyTorch. The project is designed to experiment with different models for NLP and multimodal tasks.

Language: Python - Size: 58.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

ALucek/multimodal-llm-breakdown

Outlining and demonstrating how language models are able to understand image, video, and text content.

Language: Jupyter Notebook - Size: 14.7 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

aiden200/VLM_Implementation

Implementing a Video Language Model from scratch

Language: Python - Size: 4.21 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

ideavision/llm-development

"🚀 A job-ready, hands-on repository for practical LLM development! Master prompt engineering, fine-tuning, retrieval-augmented generation (RAG), and more with real-world examples and best practices. Perfect for AI engineers looking to build and deploy powerful language models."

Language: Jupyter Notebook - Size: 27.4 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

sitamgithub-MSIT/well-being

Reducing neonatal and under-5 mortality rates via an AI-driven awareness platform with a Gradio app, Gemini API integration, and essential project utilities. #AIForGood

Language: Python - Size: 487 KB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 1 - Forks: 1

adithya-s-k/eagle

A framework streamlining Training, Finetuning, Evaluation and Deployment of Multi Modal Language models

Language: Jupyter Notebook - Size: 52.7 KB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

philbertmukunzi/OmniSage

OmniSage: AI-Powered Discord Bot. OmniSage is a versatile Discord bot that leverages Large Language Model (LLMs) to generate intelligent responses, join voice channels, provide text-to-speech functionality, and includes an interactive, AI-powered trivia game. It's designed to be your all-knowing companion in Discord servers.

Language: Python - Size: 50.8 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 1

LoupXpro/AlphaExtract

AlphaExtract is a sophisticated PDF summarization tool that combines cutting-edge AI technology with efficient document processing. The project is built using Python and leverages Meta's LLaMA 4 MOE Maverick model along with Groq's inference engine to provide fast and accurate PDF summaries.

Language: Python - Size: 12.6 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 1

rivi89/Awesome-spatial-visual-reasoning-MLLMs

Language: Python - Size: 3.31 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

Sandia7171717171/CharmBench

CharmBench offers a challenging benchmark for large vision-language models, providing datasets and evaluation tools to enhance multimodal reasoning. Check out our latest updates and contribute to the project by starring the repo! 🌟👩💻

Language: Jupyter Notebook - Size: 7.23 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

saky-semicolon/Multimodal-Readmission-Prediction

Multimodal fusion model for predicting 30-day hospital readmission using structured EHR data and BERT-based clinical text embeddings from the MIMIC-III dataset.

Language: Jupyter Notebook - Size: 1.69 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

alianoroozi/ai-hub

A collection of AI experiments, including model training, ML system development, and end-to-end pipelines.

Language: Jupyter Notebook - Size: 43.4 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

JoeJoe1313/PaliGemma-Image-Segmentation

An app with FastAPI, Docker, transformers, JAX/Flax for performing image segmentation with PaliGemma 2 mix

Language: Jupyter Notebook - Size: 8.78 MB - Last synced at: 21 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

Fashad-Ahmed/exploring-google-gemini-2.5

Explored & tailer down the google gemini 2.5 flash model and it's variants

Language: Jupyter Notebook - Size: 6.98 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

pritamqu/VCRBench

VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models

Language: Python - Size: 1.14 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

RauhanAhmed/AlphaExtract

Language: Python - Size: 5.58 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

PE51K/spbu-diploma

MLLM application to Chinese speech practice as my SPBU diploma project

Language: Jupyter Notebook - Size: 66.4 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

khoi03/Multimodal-ChatBot

A chatbot can process and analyze various forms of media including text, images, videos, and other data types.

Language: Python - Size: 2.94 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

leoli51/youtube-conspiracy-detection

Code for the paper "Evaluating AI capabilities in detecting conspiracy theories on YouTube".

Language: Jupyter Notebook - Size: 12.9 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Pavansomisetty21/Multimodal-AI-Agent-for-Video-Understanding-and-Research-using-Gemini-LLM

In this we implement Multimodal AI Agent for Video Understanding and Research we can ask any questions on video it will answer to it

Language: Jupyter Notebook - Size: 4.21 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

abhi227070/Advanced-Dish-Detection-using-AI

DishVision AI is a multimodal food recognition app powered by Google Gemini AI and Streamlit. Upload or capture a dish image, and the AI will detect its name, ingredients, and recipe instantly! 🚀🔥

Language: Python - Size: 1.34 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

imane0x/PerfectFit

PerfectFit is an AI-powered shopping assistant that uses multimodal search to quickly find ideal product matches based on text or image inputs, streamlining the online shopping experience.

Language: JavaScript - Size: 12.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

UlianaDzhumok/deepseek_janus_pro_experiments

Sample project of multimodal decision and image generation with DeepSeek Janus Pro 7B with Real-ESRGAN upscaling

Language: Jupyter Notebook - Size: 2.36 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

PrachiPatel15/Multimodal-Visual-AI-Chatbot

A powerful Streamlit application that analyzes images using multiple vision models and responds to queries about visual content through conversational AI.

Language: Python - Size: 664 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

andre-pereira/ICMI2024LLMsEnjoymentDetection

This repository contains the code, dataset, and model outputs for the ICMI 2024 paper Multimodal User Enjoyment Detection in Human-Robot Conversation: The Power of Large Language Models. It includes scripts for prompting LLMs, training supervised models, and evaluating multimodal enjoyment detection.

Language: Python - Size: 152 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0