GitHub topics: image-captioning
hsp-iit/embodied-captioning
Official repository of the preprint "Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions"
Language: Python - Size: 944 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2 - Forks: 0

HanXinzi-AI/awesome-computer-vision-resources
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
Size: 49.8 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 273 - Forks: 34

MahmoudAdham6544/vision-speak
VisionSpeak: A deep learning pipeline that generates natural language captions from images using a Vision-Encoder and GPT-2 Decoder. Bridging vision and language with PyTorch and Transformers.
Language: Python - Size: 5.56 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

AkagawaTsurunaki/zerolan-core
ZerolanCore integrates many open-source, locally deployable AI models, and aims to integrate a series of AI models such as large language model (LLM), automatic speech recognition (ASR), text-to-speech (TTS), image captioning, optical character recognition (OCR), video captioning, etc.
Language: Python - Size: 102 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 3 - Forks: 0

X-PLUG/mPLUG
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
Language: Python - Size: 1.56 MB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 93 - Forks: 8

cuixing158/Awesome-CV-MasterHub
:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works
Size: 43.5 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 419 - Forks: 29

SocAIty/socaity
SDK for generative AI.
Language: Python - Size: 26.2 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 0

huiteuros/generalt
FastAPI de génération d'ALT d'image grâce au model BLIP
Language: Python - Size: 0 Bytes - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

AI-14/pkatransnet
[IVC 2025] [Official code] - Enhancing radiology report generation: A prior knowledge-aware transformer network for effective alignment and fusion of multi-modal radiological data
Language: Python - Size: 4.42 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 1

iOPENCap/awesome-remote-image-captioning
A list of awesome remote sensing image captioning resources
Language: Python - Size: 198 KB - Last synced at: about 22 hours ago - Pushed at: 13 days ago - Stars: 110 - Forks: 1

PtiCalin/vault_image-description
Ollama powered image description
Language: JavaScript - Size: 64.5 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0

OpenGVLab/InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Language: Python - Size: 41.9 MB - Last synced at: 6 days ago - Pushed at: 10 months ago - Stars: 3,214 - Forks: 231

xogie/Add_Tags-Titles-to-Images
A Python tool that auto-generates captions and keyword tags for JPG/PNG images using a local vision-language model like BakLLaVA. Captions and tags are embedded into EXIF metadata (Title + Tags) for native Windows Explorer visibility. Includes batch processing and GUI folder selection.
Language: Python - Size: 12.7 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

SkalskiP/awesome-foundation-and-multimodal-models
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Language: Python - Size: 58.6 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 621 - Forks: 45

cstsunfu/dlk
A PyTorch Based Deep Learning Quick Develop Framework. One-Stop for train/predict/server/demo
Language: Python - Size: 9.42 MB - Last synced at: 10 days ago - Pushed at: 5 months ago - Stars: 24 - Forks: 0

alasdairtran/transform-and-tell
[CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning
Language: Python - Size: 14.2 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 91 - Forks: 15

AkagawaTsurunaki/ZerolanLiveRobot
AI VTuber with LLM, ASR, TTS, OCR, CV and more technologies to live stream or play Minecraft with you.
Language: Python - Size: 2.48 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 29 - Forks: 3

claudaff/automatic-map-storytelling
An Efficient System for Automatic Map Storytelling using Generative Pre-trained Transformer (GPT) Models – A Case Study on Historical Maps
Language: Python - Size: 2 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 4 - Forks: 2

terry-r123/Awesome-Captioning
A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)
Size: 56.6 KB - Last synced at: 4 days ago - Pushed at: about 3 years ago - Stars: 109 - Forks: 10

ZhuoxuanCao/BLIP-Hugging-Face-Quickstart-Finetune-Lora
A modular, easy-to-use framework for fine-tuning BLIP-1 on custom image captioning tasks using LoRA and Hugging Face Transformers. Includes data preprocessing, training scripts, and inference demos — with custom patching on the vision backbone. Ideal for researchers, engineers, and AI enthusiasts building lightweight captioning systems.
Language: Python - Size: 178 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

Pavansomisetty21/Image-Caption-Generation-using-LLMs-GEMINI-
we generate captions to the images which are given by user(user input) using prompt engineering and Generative AI
Language: Jupyter Notebook - Size: 366 KB - Last synced at: about 23 hours ago - Pushed at: 10 months ago - Stars: 10 - Forks: 1

kuanghuei/SCAN
PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)
Language: Python - Size: 34.2 KB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 565 - Forks: 115

digitechvishal/Image-Caption-Generator-Using-AI-Azure
This project is a lightweight web application that leverages Microsoft Azure’s Computer Vision API to generate accurate captions for uploaded images. Designed using Python and Streamlit, it provides a clean and intuitive interface to interact with AI-powered image analysis.
Language: Python - Size: 337 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

dp-ops/Image_captioning
Image captioning model using ResNet34 and Attention LSTM. The project is implimented from scratch. Using pretrained imagenet weights for resNet34 and finetunning the model in flickr8k and flickr30k datasets. Available reinforcement learning capabilities, but need fixing and better GPU
Language: Python - Size: 60.5 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

aakcay5656/image-captioning-pytorch
The project I did in the OBSS AI Intern Competition
Language: Jupyter Notebook - Size: 9.77 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

PrathameshPC77/ai_image_captioning
🖼️ AI Image Caption Generator — A simple and smart web app that generates descriptive captions for any image you upload using a pre-trained Vision Transformer (ViT) and GPT-2 model. Built with Python and Streamlit, powered by Hugging Face Transformers.
Language: Python - Size: 576 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

msamprovalaki/Exploring-Multimodal-Large-Language-Models-for-Medical-Image-Captioning
This repository includes the code for my Master Thesis, which investigates the application of Multimodal Large Language Models (MLLMs) for medical image captioning
Language: Python - Size: 5.45 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 6 - Forks: 0

TheoCoombes/ClipCap
Using pretrained encoder and language models to generate captions from multimedia inputs.
Language: Python - Size: 92.7 MB - Last synced at: 11 days ago - Pushed at: over 2 years ago - Stars: 97 - Forks: 13

sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
Language: Python - Size: 12.6 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 2,846 - Forks: 726

aimagelab/meshed-memory-transformer
Meshed-Memory Transformer for Image Captioning. CVPR 2020
Language: Python - Size: 7.07 MB - Last synced at: 28 days ago - Pushed at: over 2 years ago - Stars: 538 - Forks: 134

peteanderson80/Up-Down-Captioner
Automatic image captioning model based on Caffe, using features from bottom-up attention.
Language: Jupyter Notebook - Size: 2.6 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 246 - Forks: 68

ejlnmusic/PaliGemma-flickr8k-finetuning
# PaliGemma-flickr8k-finetuningThis repository provides a method to fine-tune the PaliGemma model on the Flickr8k dataset for improved image captioning. Explore the features and utilities designed for efficient training and testing. 🐙🌟
Language: Jupyter Notebook - Size: 375 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

STCTheRealNooby/Image-Captioning-with-ViT-and-BERT
This repository provides a straightforward image-captioning pipeline that combines a Vision Transformer (ViT) encoder with a BERT decoder. Use this setup to fine-tune your model on the Flickr8k dataset and generate captions for new images. 🖼️✨
Language: Jupyter Notebook - Size: 5.11 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

imaginary-cloud/CameraManager
Simple Swift class to provide all the configurations you need to create custom camera view in your app
Language: Swift - Size: 4.7 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 1,385 - Forks: 329

AHMEDSANA/PaliGemma-flickr8k-finetuning
This repository contains code for fine-tuning Google's PaliGemma vision-language model on the Flickr8k dataset for image captioning tasks
Language: Jupyter Notebook - Size: 401 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

AHMEDSANA/Image-Captioning-with-ViT-and-BERT
A concise image-captioning pipeline that fine-tunes a ViT encoder with a BERT decoder on Flickr8K for training, plus a standalone script to load the trained model and generate captions on new images.
Language: Jupyter Notebook - Size: 5.22 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

Markin-Wang/awesome_radiology_report_generation
Awesome radiology report generation and image captioning papers.
Size: 59.6 KB - Last synced at: 19 days ago - Pushed at: 9 months ago - Stars: 75 - Forks: 6

AnnikaLindh/Diverse_and_Specific_Image_Captioning
Unsupervised specificity-guided optimization of Image Captioning models to encourage meaningful diversity in the generated captions. Code for the paper Generating Diverse and Meaningful Captions: Unsupervised Specificity Optimization for Image Captioning (Lindh et al., 2018).
Language: Python - Size: 62.5 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 13 - Forks: 8

abhay-43/Internet-Memes-Classification-using-Multimodal-Learning-and-Image-Captioning
This project classifies internet memes using multimodal learning by combining textual and visual features. It performs offensive content detection and emotion classification leveraging the MultiOFF and Memotion-7k datasets. The model integrates ALBERT for text, VGG-11 for images, and BLIP-generated captions to improve understanding of meme sentimen
Language: Jupyter Notebook - Size: 6.01 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

stevan-milovanovic/LiteRT-for-Android
Image Classification with LiteRT
Language: Kotlin - Size: 171 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ttengwang/Caption-Anything
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
Language: Python - Size: 51.9 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 1,741 - Forks: 104

salesforce/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Language: Jupyter Notebook - Size: 6.34 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 5,265 - Forks: 688

jhc13/taggui
Tag manager and captioner for image datasets
Language: Python - Size: 22.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 994 - Forks: 46

tuanio/image2latex
Image to Latex using Encoder-Decoder architecture
Language: Jupyter Notebook - Size: 1.18 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 13 - Forks: 5

YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Language: Python - Size: 12.2 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 970 - Forks: 105

OFA-Sys/OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Language: Python - Size: 120 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 2,501 - Forks: 248

salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language: Jupyter Notebook - Size: 79.3 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 10,558 - Forks: 1,031

Abhrankan-Chakrabarti/GeminiFusion
A versatile web application that leverages advanced AI models, including Gemini Flash, DALL-E 3, and Stable Diffusion XL, to provide three main features: Chatbot Interaction, Image Captioning, and Text-to-Image Generation.
Language: Python - Size: 43 KB - Last synced at: 12 days ago - Pushed at: about 2 months ago - Stars: 4 - Forks: 2

peteanderson80/bottom-up-attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Language: Jupyter Notebook - Size: 13.4 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 1,450 - Forks: 378

gokayfem/ComfyUI_VLM_nodes
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Language: Python - Size: 359 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 490 - Forks: 50

NVlabs/prismer
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Language: Python - Size: 4.25 MB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 1,308 - Forks: 73

milaan9/Deep_Learning_Algorithms_from_Scratch
This repository explores the variety of techniques and algorithms commonly used in deep learning and the implementation in MATLAB and PYTHON
Language: Jupyter Notebook - Size: 9.85 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 172 - Forks: 171

yashk2810/Image-Captioning
Image Captioning using InceptionV3 and beam search
Language: Jupyter Notebook - Size: 74.6 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 329 - Forks: 123

symphl/blind-vision-assistant
An AI-powered embedded system that captures real-time images, generates descriptive captions using Qwen, and reads them out loud to assist the visually impaired.
Language: C++ - Size: 4.88 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

tanyuqian/redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
Language: Python - Size: 11.5 MB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 66 - Forks: 7

microsoft/Oscar 📦
Oscar and VinVL
Language: Python - Size: 715 KB - Last synced at: 3 days ago - Pushed at: almost 2 years ago - Stars: 1,049 - Forks: 251

Dewiin/blind-spot
CUNY Tech Prep 2025 Project
Language: JavaScript - Size: 3.63 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

kdexd/virtex
[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
Language: Python - Size: 3.65 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 563 - Forks: 61

Narius2030/IMCP-Support-Blinders
This project focuses on image captioning by creating two primary models: DarkNetLM and DarkNetVG2. Both models leverage the CSP DarkNet53 architecture as the backbone of YOLOv8 for feature extraction from images. Combining with Transformers or LSTM to generating captions.
Language: Python - Size: 28.8 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

phachon/gis
gis (go image server) go 实现的图片服务,实现基本的上传,下载,存储,按比例裁剪等功能
Language: Go - Size: 1.84 MB - Last synced at: about 2 months ago - Pushed at: about 7 years ago - Stars: 123 - Forks: 36

nocaps-org/nocaps-org.github.io
Wesbite for nocaps
Language: HTML - Size: 46.7 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 1

JHansiduYapa/CNN-LSTM-Image-Caption-Generator
This repository implements an image caption generator using a pretrained ResNet101 for feature extraction and an LSTM network for generating captions from images.
Language: Jupyter Notebook - Size: 9.89 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Belkinmix/Streamlit-Mini-AI-App
A streamlit-powered app that showcases multiple AI-powered tools: facial emotion detection, batch image captioning, text sentiment analysis, and a chaos-filled fun zone.
Language: Python - Size: 2.13 MB - Last synced at: 26 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

bhoomikaniranjan/Depiction-of-image-features-with-audio-to-aid-visually-impaired-persons
This project transforms visual content into vivid audio narratives for visually impaired individuals. Using advanced image recognition and text-to-speech technologies, it generates detailed captions and provides audio output in English, Kannada, and Hindi, fostering inclusivity and independence.
Language: Python - Size: 8.79 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

kalyaninguva/Image_Captioning
This project generates textual descriptions for images using deep learning. I
Language: Jupyter Notebook - Size: 962 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

clementfornes13/leyenda_project
Leyenda is a Deep Learning-based project focused on image classification, preprocessing, and automatic caption generation. It combines Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to process visual data and describe it in natural language.
Language: Jupyter Notebook - Size: 172 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Koldim2001/Image_captioning
Генерация описаний к изображениям с помощью различных архитектур нейронных сетей
Language: Jupyter Notebook - Size: 34 MB - Last synced at: 29 days ago - Pushed at: about 2 years ago - Stars: 18 - Forks: 0

anuragmishracse/caption_generator
A modular library built on top of Keras and TensorFlow to generate a caption in natural language for any input image.
Language: Python - Size: 902 KB - Last synced at: about 1 month ago - Pushed at: about 7 years ago - Stars: 267 - Forks: 119

Gholamrezadar/ollama-image-captioning
Captions images using Ollama and a multimodal model like Gemma3:4b.
Language: Python - Size: 1000 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

anavarroa/TFM-LVLMs
A model capable of describing and answer questions about remote sensing images.
Language: Python - Size: 6.75 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

shreydan/VisionGPT2
Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.
Language: Jupyter Notebook - Size: 289 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 42 - Forks: 2

Amir-Hofo/BLIP_Image_Captioning
A local Flask application for image captioning using the BLIP model. Users can run the app on their system, upload an image, and receive a descriptive caption generated by the model.
Language: CSS - Size: 1.41 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

aimagelab/DiCO
[BMVC 2024 Oral ✨] Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
Language: Python - Size: 6.76 MB - Last synced at: 26 days ago - Pushed at: 10 months ago - Stars: 18 - Forks: 0

Pu5hk4r/PROJECT-IMAGE-CAPTION-GENERATION
lightweight AI/ML project that generates detailed captions for uploaded images using the Florence-2 Transformer model. It integrates an interactive Gradio UI, enabling real-time image-to-text generation powered by optimized deep learning workflows.
Language: Python - Size: 1000 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

JDAI-CV/image-captioning
Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
Language: Python - Size: 733 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 274 - Forks: 54

fano2458/Zhadiger-Kazakh-Language-AI
AI services project "Zhadiger" for Kazakh Language developed using NVIDIA Triton Inference Server. Including LLM, OCR, Image Captioning, NER, TTS, STT, Translator and etc.
Language: Python - Size: 47.4 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

ProGamerGov/VLM-Captioning-Tools
Python scripts to use for captioning images with VLMs
Language: Python - Size: 21.5 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 39 - Forks: 0

aimagelab/show-control-and-tell
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019
Language: Python - Size: 1.71 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 283 - Forks: 61

Throughmark/throughmark
Find and Annotate Features in Images, From Objects to Concepts
Language: TypeScript - Size: 111 MB - Last synced at: about 2 hours ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

luo3300612/image-captioning-DLCT
Official pytorch implementation of paper "Dual-Level Collaborative Transformer for Image Captioning" (AAAI 2021).
Language: Jupyter Notebook - Size: 1.17 MB - Last synced at: 9 days ago - Pushed at: about 3 years ago - Stars: 200 - Forks: 29

krasserm/fairseq-image-captioning
Transformer-based image captioning extension for pytorch/fairseq
Language: Python - Size: 3.09 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 316 - Forks: 57

john-fante/john-fante
In my code portfolio, I generally try new techniques and methods in machine learning. I don't like only copying and pasting.
Size: 318 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

prince2004patel/Image-Caption-Generator
An image captioning model that generates natural language descriptions for images. Built using ResNet50 for feature extraction and LSTM for sequence generation using flicker8k data
Language: Jupyter Notebook - Size: 29.7 MB - Last synced at: 10 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

PRITHIVSAKTHIUR/Image-Captioning-Florence2
This application utilizes the powerful Florence-2 vision-language model from Microsoft to generate comprehensive captions for images. The model is capable of understanding visual content and expressing it in natural language.
Language: Python - Size: 8.79 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

ruotianluo/self-critical.pytorch
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.
Language: Python - Size: 600 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 1,000 - Forks: 277

jmisilo/clip-gpt-captioning
CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.
Language: Python - Size: 873 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 117 - Forks: 33

santoshlite/ByteDetective
The easiest way to search for images on your desktop 🔎
Language: Rust - Size: 3.87 MB - Last synced at: 28 days ago - Pushed at: almost 2 years ago - Stars: 29 - Forks: 2

AtheerAlzhrani/BlipCaptioner
Interactive web application that generates descriptive captions for images
Language: Jupyter Notebook - Size: 132 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

veydantkatyal/image-caption-recommender
recommends the most relevant image captions using OpenAI’s CLIP model and machine learning for intelligent content generation.
Language: Jupyter Notebook - Size: 260 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

aehrc/cvt2distilgpt2
Improving Chest X-Ray Report Generation by Leveraging Warm-Starting
Language: Python - Size: 93.5 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 67 - Forks: 7

aiishwarrya/VisualLanguageModel
A custom Vision-Language Model (VLM) built from scratch, using SigLip for contrastive learning and a ViT-based encoder to generate meaningful image captions and semantic descriptions.
Size: 2.49 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

GT-RIPL/Xmodal-Ctx
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
Language: Python - Size: 93.6 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 60 - Forks: 10

j-min/CLIP-Caption-Reward
PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)
Language: Python - Size: 2.64 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 241 - Forks: 26

yunjey/show-attend-and-tell
TensorFlow Implementation of "Show, Attend and Tell"
Language: Jupyter Notebook - Size: 49.1 MB - Last synced at: 27 days ago - Pushed at: almost 7 years ago - Stars: 907 - Forks: 323

german-zarate/image-captioning-app
Deployed image captioning ML model using Flask and access via Flutter app
Language: Python - Size: 7.54 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

Mohammadimh76/image-caption-generator-pytorch
Image Caption Generation using Deep Learning (CNN + LSTM Architecture)
Language: Python - Size: 1.04 GB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

aehrc/cxrmate
CXRMate: Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation
Language: Python - Size: 4.03 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 15 - Forks: 3

Zuellni/Image-Tools
Various image processing scripts.
Language: Python - Size: 16.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

PhilemonTJ/ImageCaptioningSystem
ImageCaptioningSystem is a Python application that generates descriptive captions for images using deep learning models, providing an automated interpretation of visual content.
Language: Python - Size: 20.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

neural-nuts/image-caption-generator 📦
[DEPRECATED] A Neural Network based generative model for captioning images using Tensorflow
Language: Jupyter Notebook - Size: 9.64 MB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 146 - Forks: 56
