An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: image-captioning

hsp-iit/embodied-captioning

Official repository of the preprint "Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions"

Language: Python - Size: 944 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2 - Forks: 0

HanXinzi-AI/awesome-computer-vision-resources

a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。

Size: 49.8 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 273 - Forks: 34

MahmoudAdham6544/vision-speak

VisionSpeak: A deep learning pipeline that generates natural language captions from images using a Vision-Encoder and GPT-2 Decoder. Bridging vision and language with PyTorch and Transformers.

Language: Python - Size: 5.56 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

AkagawaTsurunaki/zerolan-core

ZerolanCore integrates many open-source, locally deployable AI models, and aims to integrate a series of AI models such as large language model (LLM), automatic speech recognition (ASR), text-to-speech (TTS), image captioning, optical character recognition (OCR), video captioning, etc.

Language: Python - Size: 102 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 3 - Forks: 0

X-PLUG/mPLUG

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)

Language: Python - Size: 1.56 MB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 93 - Forks: 8

cuixing158/Awesome-CV-MasterHub

:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works

Size: 43.5 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 419 - Forks: 29

SocAIty/socaity

SDK for generative AI.

Language: Python - Size: 26.2 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 0

huiteuros/generalt

FastAPI de génération d'ALT d'image grâce au model BLIP

Language: Python - Size: 0 Bytes - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

AI-14/pkatransnet

[IVC 2025] [Official code] - Enhancing radiology report generation: A prior knowledge-aware transformer network for effective alignment and fusion of multi-modal radiological data

Language: Python - Size: 4.42 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 1

iOPENCap/awesome-remote-image-captioning

A list of awesome remote sensing image captioning resources

Language: Python - Size: 198 KB - Last synced at: about 22 hours ago - Pushed at: 13 days ago - Stars: 110 - Forks: 1

PtiCalin/vault_image-description

Ollama powered image description

Language: JavaScript - Size: 64.5 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0

OpenGVLab/InternGPT

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Language: Python - Size: 41.9 MB - Last synced at: 6 days ago - Pushed at: 10 months ago - Stars: 3,214 - Forks: 231

xogie/Add_Tags-Titles-to-Images

A Python tool that auto-generates captions and keyword tags for JPG/PNG images using a local vision-language model like BakLLaVA. Captions and tags are embedded into EXIF metadata (Title + Tags) for native Windows Explorer visibility. Includes batch processing and GUI folder selection.

Language: Python - Size: 12.7 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

SkalskiP/awesome-foundation-and-multimodal-models

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

Language: Python - Size: 58.6 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 621 - Forks: 45

cstsunfu/dlk

A PyTorch Based Deep Learning Quick Develop Framework. One-Stop for train/predict/server/demo

Language: Python - Size: 9.42 MB - Last synced at: 10 days ago - Pushed at: 5 months ago - Stars: 24 - Forks: 0

alasdairtran/transform-and-tell

[CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning

Language: Python - Size: 14.2 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 91 - Forks: 15

AkagawaTsurunaki/ZerolanLiveRobot

AI VTuber with LLM, ASR, TTS, OCR, CV and more technologies to live stream or play Minecraft with you.

Language: Python - Size: 2.48 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 29 - Forks: 3

claudaff/automatic-map-storytelling

An Efficient System for Automatic Map Storytelling using Generative Pre-trained Transformer (GPT) Models – A Case Study on Historical Maps

Language: Python - Size: 2 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 4 - Forks: 2

terry-r123/Awesome-Captioning

A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)

Size: 56.6 KB - Last synced at: 4 days ago - Pushed at: about 3 years ago - Stars: 109 - Forks: 10

ZhuoxuanCao/BLIP-Hugging-Face-Quickstart-Finetune-Lora

A modular, easy-to-use framework for fine-tuning BLIP-1 on custom image captioning tasks using LoRA and Hugging Face Transformers. Includes data preprocessing, training scripts, and inference demos — with custom patching on the vision backbone. Ideal for researchers, engineers, and AI enthusiasts building lightweight captioning systems.

Language: Python - Size: 178 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

Pavansomisetty21/Image-Caption-Generation-using-LLMs-GEMINI-

we generate captions to the images which are given by user(user input) using prompt engineering and Generative AI

Language: Jupyter Notebook - Size: 366 KB - Last synced at: about 23 hours ago - Pushed at: 10 months ago - Stars: 10 - Forks: 1

kuanghuei/SCAN

PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)

Language: Python - Size: 34.2 KB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 565 - Forks: 115

digitechvishal/Image-Caption-Generator-Using-AI-Azure

This project is a lightweight web application that leverages Microsoft Azure’s Computer Vision API to generate accurate captions for uploaded images. Designed using Python and Streamlit, it provides a clean and intuitive interface to interact with AI-powered image analysis.

Language: Python - Size: 337 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

dp-ops/Image_captioning

Image captioning model using ResNet34 and Attention LSTM. The project is implimented from scratch. Using pretrained imagenet weights for resNet34 and finetunning the model in flickr8k and flickr30k datasets. Available reinforcement learning capabilities, but need fixing and better GPU

Language: Python - Size: 60.5 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

aakcay5656/image-captioning-pytorch

The project I did in the OBSS AI Intern Competition

Language: Jupyter Notebook - Size: 9.77 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

PrathameshPC77/ai_image_captioning

🖼️ AI Image Caption Generator — A simple and smart web app that generates descriptive captions for any image you upload using a pre-trained Vision Transformer (ViT) and GPT-2 model. Built with Python and Streamlit, powered by Hugging Face Transformers.

Language: Python - Size: 576 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

msamprovalaki/Exploring-Multimodal-Large-Language-Models-for-Medical-Image-Captioning

This repository includes the code for my Master Thesis, which investigates the application of Multimodal Large Language Models (MLLMs) for medical image captioning

Language: Python - Size: 5.45 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 6 - Forks: 0

TheoCoombes/ClipCap

Using pretrained encoder and language models to generate captions from multimedia inputs.

Language: Python - Size: 92.7 MB - Last synced at: 11 days ago - Pushed at: over 2 years ago - Stars: 97 - Forks: 13

sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning

Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning

Language: Python - Size: 12.6 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 2,846 - Forks: 726

aimagelab/meshed-memory-transformer

Meshed-Memory Transformer for Image Captioning. CVPR 2020

Language: Python - Size: 7.07 MB - Last synced at: 28 days ago - Pushed at: over 2 years ago - Stars: 538 - Forks: 134

peteanderson80/Up-Down-Captioner

Automatic image captioning model based on Caffe, using features from bottom-up attention.

Language: Jupyter Notebook - Size: 2.6 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 246 - Forks: 68

ejlnmusic/PaliGemma-flickr8k-finetuning

# PaliGemma-flickr8k-finetuningThis repository provides a method to fine-tune the PaliGemma model on the Flickr8k dataset for improved image captioning. Explore the features and utilities designed for efficient training and testing. 🐙🌟

Language: Jupyter Notebook - Size: 375 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

STCTheRealNooby/Image-Captioning-with-ViT-and-BERT

This repository provides a straightforward image-captioning pipeline that combines a Vision Transformer (ViT) encoder with a BERT decoder. Use this setup to fine-tune your model on the Flickr8k dataset and generate captions for new images. 🖼️✨

Language: Jupyter Notebook - Size: 5.11 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

imaginary-cloud/CameraManager

Simple Swift class to provide all the configurations you need to create custom camera view in your app

Language: Swift - Size: 4.7 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 1,385 - Forks: 329

AHMEDSANA/PaliGemma-flickr8k-finetuning

This repository contains code for fine-tuning Google's PaliGemma vision-language model on the Flickr8k dataset for image captioning tasks

Language: Jupyter Notebook - Size: 401 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

AHMEDSANA/Image-Captioning-with-ViT-and-BERT

A concise image-captioning pipeline that fine-tunes a ViT encoder with a BERT decoder on Flickr8K for training, plus a standalone script to load the trained model and generate captions on new images.

Language: Jupyter Notebook - Size: 5.22 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

Markin-Wang/awesome_radiology_report_generation

Awesome radiology report generation and image captioning papers.

Size: 59.6 KB - Last synced at: 19 days ago - Pushed at: 9 months ago - Stars: 75 - Forks: 6

AnnikaLindh/Diverse_and_Specific_Image_Captioning

Unsupervised specificity-guided optimization of Image Captioning models to encourage meaningful diversity in the generated captions. Code for the paper Generating Diverse and Meaningful Captions: Unsupervised Specificity Optimization for Image Captioning (Lindh et al., 2018).

Language: Python - Size: 62.5 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 13 - Forks: 8

abhay-43/Internet-Memes-Classification-using-Multimodal-Learning-and-Image-Captioning

This project classifies internet memes using multimodal learning by combining textual and visual features. It performs offensive content detection and emotion classification leveraging the MultiOFF and Memotion-7k datasets. The model integrates ALBERT for text, VGG-11 for images, and BLIP-generated captions to improve understanding of meme sentimen

Language: Jupyter Notebook - Size: 6.01 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

stevan-milovanovic/LiteRT-for-Android

Image Classification with LiteRT

Language: Kotlin - Size: 171 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ttengwang/Caption-Anything

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything

Language: Python - Size: 51.9 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 1,741 - Forks: 104

salesforce/BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Language: Jupyter Notebook - Size: 6.34 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 5,265 - Forks: 688

jhc13/taggui

Tag manager and captioner for image datasets

Language: Python - Size: 22.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 994 - Forks: 46

tuanio/image2latex

Image to Latex using Encoder-Decoder architecture

Language: Jupyter Notebook - Size: 1.18 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 13 - Forks: 5

YehLi/xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

Language: Python - Size: 12.2 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 970 - Forks: 105

OFA-Sys/OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Language: Python - Size: 120 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 2,501 - Forks: 248

salesforce/LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language: Jupyter Notebook - Size: 79.3 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 10,558 - Forks: 1,031

Abhrankan-Chakrabarti/GeminiFusion

A versatile web application that leverages advanced AI models, including Gemini Flash, DALL-E 3, and Stable Diffusion XL, to provide three main features: Chatbot Interaction, Image Captioning, and Text-to-Image Generation.

Language: Python - Size: 43 KB - Last synced at: 12 days ago - Pushed at: about 2 months ago - Stars: 4 - Forks: 2

peteanderson80/bottom-up-attention

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

Language: Jupyter Notebook - Size: 13.4 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 1,450 - Forks: 378

gokayfem/ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

Language: Python - Size: 359 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 490 - Forks: 50

NVlabs/prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Language: Python - Size: 4.25 MB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 1,308 - Forks: 73

milaan9/Deep_Learning_Algorithms_from_Scratch

This repository explores the variety of techniques and algorithms commonly used in deep learning and the implementation in MATLAB and PYTHON

Language: Jupyter Notebook - Size: 9.85 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 172 - Forks: 171

yashk2810/Image-Captioning

Image Captioning using InceptionV3 and beam search

Language: Jupyter Notebook - Size: 74.6 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 329 - Forks: 123

symphl/blind-vision-assistant

An AI-powered embedded system that captures real-time images, generates descriptive captions using Qwen, and reads them out loud to assist the visually impaired.

Language: C++ - Size: 4.88 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

tanyuqian/redco

NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference

Language: Python - Size: 11.5 MB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 66 - Forks: 7

microsoft/Oscar 📦

Oscar and VinVL

Language: Python - Size: 715 KB - Last synced at: 3 days ago - Pushed at: almost 2 years ago - Stars: 1,049 - Forks: 251

Dewiin/blind-spot

CUNY Tech Prep 2025 Project

Language: JavaScript - Size: 3.63 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

kdexd/virtex

[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations

Language: Python - Size: 3.65 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 563 - Forks: 61

Narius2030/IMCP-Support-Blinders

This project focuses on image captioning by creating two primary models: DarkNetLM and DarkNetVG2. Both models leverage the CSP DarkNet53 architecture as the backbone of YOLOv8 for feature extraction from images. Combining with Transformers or LSTM to generating captions.

Language: Python - Size: 28.8 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

phachon/gis

gis (go image server) go 实现的图片服务,实现基本的上传,下载,存储,按比例裁剪等功能

Language: Go - Size: 1.84 MB - Last synced at: about 2 months ago - Pushed at: about 7 years ago - Stars: 123 - Forks: 36

nocaps-org/nocaps-org.github.io

Wesbite for nocaps

Language: HTML - Size: 46.7 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 1

JHansiduYapa/CNN-LSTM-Image-Caption-Generator

This repository implements an image caption generator using a pretrained ResNet101 for feature extraction and an LSTM network for generating captions from images.

Language: Jupyter Notebook - Size: 9.89 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Belkinmix/Streamlit-Mini-AI-App

A streamlit-powered app that showcases multiple AI-powered tools: facial emotion detection, batch image captioning, text sentiment analysis, and a chaos-filled fun zone.

Language: Python - Size: 2.13 MB - Last synced at: 26 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

bhoomikaniranjan/Depiction-of-image-features-with-audio-to-aid-visually-impaired-persons

This project transforms visual content into vivid audio narratives for visually impaired individuals. Using advanced image recognition and text-to-speech technologies, it generates detailed captions and provides audio output in English, Kannada, and Hindi, fostering inclusivity and independence.

Language: Python - Size: 8.79 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

kalyaninguva/Image_Captioning

This project generates textual descriptions for images using deep learning. I

Language: Jupyter Notebook - Size: 962 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

clementfornes13/leyenda_project

Leyenda is a Deep Learning-based project focused on image classification, preprocessing, and automatic caption generation. It combines Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to process visual data and describe it in natural language.

Language: Jupyter Notebook - Size: 172 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Koldim2001/Image_captioning

Генерация описаний к изображениям с помощью различных архитектур нейронных сетей

Language: Jupyter Notebook - Size: 34 MB - Last synced at: 29 days ago - Pushed at: about 2 years ago - Stars: 18 - Forks: 0

anuragmishracse/caption_generator

A modular library built on top of Keras and TensorFlow to generate a caption in natural language for any input image.

Language: Python - Size: 902 KB - Last synced at: about 1 month ago - Pushed at: about 7 years ago - Stars: 267 - Forks: 119

Gholamrezadar/ollama-image-captioning

Captions images using Ollama and a multimodal model like Gemma3:4b.

Language: Python - Size: 1000 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

anavarroa/TFM-LVLMs

A model capable of describing and answer questions about remote sensing images.

Language: Python - Size: 6.75 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

shreydan/VisionGPT2

Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.

Language: Jupyter Notebook - Size: 289 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 42 - Forks: 2

Amir-Hofo/BLIP_Image_Captioning

A local Flask application for image captioning using the BLIP model. Users can run the app on their system, upload an image, and receive a descriptive caption generated by the model.

Language: CSS - Size: 1.41 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

aimagelab/DiCO

[BMVC 2024 Oral ✨] Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization

Language: Python - Size: 6.76 MB - Last synced at: 26 days ago - Pushed at: 10 months ago - Stars: 18 - Forks: 0

Pu5hk4r/PROJECT-IMAGE-CAPTION-GENERATION

lightweight AI/ML project that generates detailed captions for uploaded images using the Florence-2 Transformer model. It integrates an interactive Gradio UI, enabling real-time image-to-text generation powered by optimized deep learning workflows.

Language: Python - Size: 1000 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

JDAI-CV/image-captioning

Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Language: Python - Size: 733 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 274 - Forks: 54

fano2458/Zhadiger-Kazakh-Language-AI

AI services project "Zhadiger" for Kazakh Language developed using NVIDIA Triton Inference Server. Including LLM, OCR, Image Captioning, NER, TTS, STT, Translator and etc.

Language: Python - Size: 47.4 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

ProGamerGov/VLM-Captioning-Tools

Python scripts to use for captioning images with VLMs

Language: Python - Size: 21.5 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 39 - Forks: 0

aimagelab/show-control-and-tell

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019

Language: Python - Size: 1.71 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 283 - Forks: 61

Throughmark/throughmark

Find and Annotate Features in Images, From Objects to Concepts

Language: TypeScript - Size: 111 MB - Last synced at: about 2 hours ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

luo3300612/image-captioning-DLCT

Official pytorch implementation of paper "Dual-Level Collaborative Transformer for Image Captioning" (AAAI 2021).

Language: Jupyter Notebook - Size: 1.17 MB - Last synced at: 9 days ago - Pushed at: about 3 years ago - Stars: 200 - Forks: 29

krasserm/fairseq-image-captioning

Transformer-based image captioning extension for pytorch/fairseq

Language: Python - Size: 3.09 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 316 - Forks: 57

john-fante/john-fante

In my code portfolio, I generally try new techniques and methods in machine learning. I don't like only copying and pasting.

Size: 318 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

prince2004patel/Image-Caption-Generator

An image captioning model that generates natural language descriptions for images. Built using ResNet50 for feature extraction and LSTM for sequence generation using flicker8k data

Language: Jupyter Notebook - Size: 29.7 MB - Last synced at: 10 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

PRITHIVSAKTHIUR/Image-Captioning-Florence2

This application utilizes the powerful Florence-2 vision-language model from Microsoft to generate comprehensive captions for images. The model is capable of understanding visual content and expressing it in natural language.

Language: Python - Size: 8.79 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

ruotianluo/self-critical.pytorch

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

Language: Python - Size: 600 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 1,000 - Forks: 277

jmisilo/clip-gpt-captioning

CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.

Language: Python - Size: 873 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 117 - Forks: 33

santoshlite/ByteDetective

The easiest way to search for images on your desktop 🔎

Language: Rust - Size: 3.87 MB - Last synced at: 28 days ago - Pushed at: almost 2 years ago - Stars: 29 - Forks: 2

AtheerAlzhrani/BlipCaptioner

Interactive web application that generates descriptive captions for images

Language: Jupyter Notebook - Size: 132 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

veydantkatyal/image-caption-recommender

recommends the most relevant image captions using OpenAI’s CLIP model and machine learning for intelligent content generation.

Language: Jupyter Notebook - Size: 260 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

aehrc/cvt2distilgpt2

Improving Chest X-Ray Report Generation by Leveraging Warm-Starting

Language: Python - Size: 93.5 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 67 - Forks: 7

aiishwarrya/VisualLanguageModel

A custom Vision-Language Model (VLM) built from scratch, using SigLip for contrastive learning and a ViT-based encoder to generate meaningful image captions and semantic descriptions.

Size: 2.49 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

GT-RIPL/Xmodal-Ctx

Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning

Language: Python - Size: 93.6 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 60 - Forks: 10

j-min/CLIP-Caption-Reward

PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)

Language: Python - Size: 2.64 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 241 - Forks: 26

yunjey/show-attend-and-tell

TensorFlow Implementation of "Show, Attend and Tell"

Language: Jupyter Notebook - Size: 49.1 MB - Last synced at: 27 days ago - Pushed at: almost 7 years ago - Stars: 907 - Forks: 323

german-zarate/image-captioning-app

Deployed image captioning ML model using Flask and access via Flutter app

Language: Python - Size: 7.54 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

Mohammadimh76/image-caption-generator-pytorch

Image Caption Generation using Deep Learning (CNN + LSTM Architecture)

Language: Python - Size: 1.04 GB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

aehrc/cxrmate

CXRMate: Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation

Language: Python - Size: 4.03 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 15 - Forks: 3

Zuellni/Image-Tools

Various image processing scripts.

Language: Python - Size: 16.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

PhilemonTJ/ImageCaptioningSystem

ImageCaptioningSystem is a Python application that generates descriptive captions for images using deep learning models, providing an automated interpretation of visual content.

Language: Python - Size: 20.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

neural-nuts/image-caption-generator 📦

[DEPRECATED] A Neural Network based generative model for captioning images using Tensorflow

Language: Jupyter Notebook - Size: 9.64 MB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 146 - Forks: 56

Related Keywords
image-captioning 827 deep-learning 250 pytorch 182 computer-vision 163 lstm 120 tensorflow 94 cnn 94 machine-learning 93 python 92 nlp 77 rnn 72 keras 60 natural-language-processing 53 convolutional-neural-networks 44 transformer 44 attention-mechanism 41 neural-networks 40 image-processing 37 lstm-neural-networks 35 recurrent-neural-networks 34 object-detection 34 image-classification 33 encoder-decoder 32 transformers 32 image-caption-generator 29 neural-network 29 deep-neural-networks 27 python3 26 ai 25 artificial-intelligence 25 captioning-images 24 image-to-text 24 multimodal 22 flickr8k-dataset 22 attention 22 flask 21 huggingface 19 caption-generation 19 vgg16 18 blip 18 clip 17 visual-question-answering 17 image-caption 17 inceptionv3 16 generative-ai 16 transfer-learning 16 show-attend-and-tell 16 resnet-50 16 image-recognition 16 llm 16 image-generation 14 bleu-score 14 beam-search 14 keras-tensorflow 14 mscoco-dataset 14 vision-transformer 14 image 13 resnet 13 mscoco 13 streamlit 13 ocr 12 fine-tuning 12 vqa 12 vision-and-language 12 multimodal-learning 12 reinforcement-learning 11 image-segmentation 11 face-recognition 11 attention-model 11 coco-dataset 11 huggingface-transformers 11 vision-language-model 11 dataset 11 tensorflow2 10 jupyter-notebook 10 show-and-tell 10 opencv 10 vision-language 10 encoder 9 llava 9 gan 9 face-detection 9 video-captioning 9 stable-diffusion 9 nlp-machine-learning 9 docker 9 captioning 8 encoder-decoder-model 8 coco 8 text-generation 8 torch 8 gpt-2 8 cnn-keras 8 deeplearning 8 lstm-model 8 decoder 8 data-science 8 inception-v3 8 text-to-image 8 word-embeddings 8