An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: image-captioning

PRITHIVSAKTHIUR/Image-Captioning-Florence2

This application utilizes the powerful Florence-2 vision-language model from Microsoft to generate comprehensive captions for images. The model is capable of understanding visual content and expressing it in natural language.

Language: Python - Size: 8.79 KB - Last synced at: about 6 hours ago - Pushed at: about 19 hours ago - Stars: 0 - Forks: 0

fano2458/Zhadiger-Kazakh-Language-AI

AI services project "Zhadiger" for Kazakh Language developed using NVIDIA Triton Inference Server. Including LLM, OCR, Image Captioning, NER, TTS, STT, Translator and etc.

Language: Python - Size: 47.3 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2 - Forks: 0

cuixing158/Awesome-CV-MasterHub

:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works

Size: 14.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 214 - Forks: 12

sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning

Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning

Language: Python - Size: 12.6 MB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 2,830 - Forks: 722

OFA-Sys/OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Language: Python - Size: 120 MB - Last synced at: 5 days ago - Pushed at: 12 months ago - Stars: 2,491 - Forks: 249

claudaff/automatic-map-storytelling

An Efficient System for Automatic Map Storytelling using Generative Pre-trained Transformer (GPT) Models – A Case Study on Historical Maps

Language: Python - Size: 1.98 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 4 - Forks: 1

salesforce/LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language: Jupyter Notebook - Size: 79.3 MB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 10,440 - Forks: 1,019

SkalskiP/awesome-foundation-and-multimodal-models

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

Language: Python - Size: 58.6 KB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 611 - Forks: 45

hsp-iit/embodied-captioning

Official repository of the preprint "Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions"

Size: 669 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

iOPENCap/awesome-remote-image-captioning

A list of awesome remote sensing image captioning resources

Language: Python - Size: 414 KB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 104 - Forks: 1

AtheerAlzhrani/BlipCaptioner

Interactive web application that generates descriptive captions for images

Language: Jupyter Notebook - Size: 132 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

HanXinzi-AI/awesome-computer-vision-resources

a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。

Size: 49.8 MB - Last synced at: 8 days ago - Pushed at: 11 months ago - Stars: 238 - Forks: 33

aimagelab/DiCO

[BMVC 2024 Oral ✨] Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization

Language: Python - Size: 6.76 MB - Last synced at: 8 days ago - Pushed at: 7 months ago - Stars: 17 - Forks: 0

OpenGVLab/InternGPT

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Language: Python - Size: 41.9 MB - Last synced at: 9 days ago - Pushed at: 8 months ago - Stars: 3,218 - Forks: 230

salesforce/BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Language: Jupyter Notebook - Size: 6.34 MB - Last synced at: 10 days ago - Pushed at: 9 months ago - Stars: 5,165 - Forks: 681

microsoft/Oscar 📦

Oscar and VinVL

Language: Python - Size: 715 KB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 1,048 - Forks: 252

veydantkatyal/image-caption-recommender

recommends the most relevant image captions using OpenAI’s CLIP model and machine learning for intelligent content generation.

Language: Jupyter Notebook - Size: 260 KB - Last synced at: 37 minutes ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

imaginary-cloud/CameraManager

Simple Swift class to provide all the configurations you need to create custom camera view in your app

Language: Swift - Size: 4.7 MB - Last synced at: 9 days ago - Pushed at: 9 months ago - Stars: 1,382 - Forks: 327

aehrc/cvt2distilgpt2

Improving Chest X-Ray Report Generation by Leveraging Warm-Starting

Language: Python - Size: 93.5 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 67 - Forks: 7

ttengwang/Caption-Anything

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything

Language: Python - Size: 51.9 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 1,732 - Forks: 104

jmisilo/clip-gpt-captioning

CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.

Language: Python - Size: 873 KB - Last synced at: 10 days ago - Pushed at: 2 months ago - Stars: 117 - Forks: 32

anuragmishracse/caption_generator

A modular library built on top of Keras and TensorFlow to generate a caption in natural language for any input image.

Language: Python - Size: 902 KB - Last synced at: 10 days ago - Pushed at: almost 7 years ago - Stars: 265 - Forks: 119

AkagawaTsurunaki/ZerolanLiveRobot

AI VTuber with LLM, ASR, TTS, OCR, CV and more technologies to live stream or play Minecraft with you.

Language: Python - Size: 2.3 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 27 - Forks: 3

Markin-Wang/awesome_radiology_report_generation

Awesome radiology report generation and image captioning papers.

Size: 59.6 KB - Last synced at: 4 days ago - Pushed at: 6 months ago - Stars: 72 - Forks: 6

aiishwarrya/VisualLanguageModel

A custom Vision-Language Model (VLM) built from scratch, using SigLip for contrastive learning and a ViT-based encoder to generate meaningful image captions and semantic descriptions.

Size: 2.49 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

GT-RIPL/Xmodal-Ctx

Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning

Language: Python - Size: 93.6 MB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 60 - Forks: 10

j-min/CLIP-Caption-Reward

PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)

Language: Python - Size: 2.64 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 241 - Forks: 26

german-zarate/image-captioning-app

Deployed image captioning ML model using Flask and access via Flutter app

Language: Python - Size: 7.54 MB - Last synced at: 11 days ago - Pushed at: 11 months ago - Stars: 5 - Forks: 0

Mohammadimh76/image-caption-generator-pytorch

Image Caption Generation using Deep Learning (CNN + LSTM Architecture)

Language: Python - Size: 1.04 GB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

aehrc/cxrmate

CXRMate: Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation

Language: Python - Size: 4.03 MB - Last synced at: 7 days ago - Pushed at: about 2 months ago - Stars: 15 - Forks: 3

TheoCoombes/ClipCap

Using pretrained encoder and language models to generate captions from multimedia inputs.

Language: Python - Size: 92.7 MB - Last synced at: 6 days ago - Pushed at: about 2 years ago - Stars: 96 - Forks: 13

krasserm/fairseq-image-captioning

Transformer-based image captioning extension for pytorch/fairseq

Language: Python - Size: 3.09 MB - Last synced at: 11 days ago - Pushed at: over 4 years ago - Stars: 315 - Forks: 57

Zuellni/Image-Tools

Various image processing scripts.

Language: Python - Size: 16.6 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

PhilemonTJ/ImageCaptioningSystem

ImageCaptioningSystem is a Python application that generates descriptive captions for images using deep learning models, providing an automated interpretation of visual content.

Language: Python - Size: 20.5 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

AdirthaBorgohain/art-critiq

A multi modal pipeline to generate three tones of reviews [harsh, constructive, kind] for a given artwork using fine-tuned Flan-T5 models.

Language: Jupyter Notebook - Size: 175 KB - Last synced at: 20 days ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

aimagelab/PMA-Net

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023

Language: Python - Size: 5.34 MB - Last synced at: 8 days ago - Pushed at: 11 months ago - Stars: 17 - Forks: 2

dayyass/image-captioning

My solution to the Image Captioning Final Project of the Coursera "Introduction to Deep Learning" course with trained model deployed as telegram bot.

Language: Jupyter Notebook - Size: 35.1 MB - Last synced at: 6 days ago - Pushed at: almost 4 years ago - Stars: 6 - Forks: 1

aimagelab/meshed-memory-transformer

Meshed-Memory Transformer for Image Captioning. CVPR 2020

Language: Python - Size: 7.07 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 531 - Forks: 135

Deiwulf/AI-image-auto-tagger Fork of Ketengan-Diffusion/wdv3-batch-vit-tagger

The ultimate open-source AI tagging tool for image galleries using metadata, or .txt files for AI training. Using newest wd-vit-tagger-v3 model by SmilingWolf

Language: Python - Size: 273 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

SocAIty/socaity

SDK for generative AI.

Language: Python - Size: 24.2 MB - Last synced at: 4 days ago - Pushed at: 27 days ago - Stars: 2 - Forks: 0

reshalfahsi/image-captioning-mobilenet-llama3

Image Captioning With MobileNet-LLaMA 3

Language: Jupyter Notebook - Size: 3.56 MB - Last synced at: 6 days ago - Pushed at: 10 months ago - Stars: 5 - Forks: 0

peteanderson80/Up-Down-Captioner

Automatic image captioning model based on Caffe, using features from bottom-up attention.

Language: Jupyter Notebook - Size: 2.6 MB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 245 - Forks: 69

yashk2810/Image-Captioning

Image Captioning using InceptionV3 and beam search

Language: Jupyter Notebook - Size: 74.6 MB - Last synced at: 12 days ago - Pushed at: over 4 years ago - Stars: 327 - Forks: 122

leftsl/ENIMNet

This is the enimnet network implementation code.

Language: Python - Size: 4.75 MB - Last synced at: 24 days ago - Pushed at: 29 days ago - Stars: 1 - Forks: 0

kdexd/virtex

[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations

Language: Python - Size: 3.65 MB - Last synced at: 14 days ago - Pushed at: over 1 year ago - Stars: 561 - Forks: 61

dp-ops/Image_captioning

Image captioning model using ResNet34 and Attention LSTM. The project is implimented from scratch. Using pretrained imagenet weights for resNet34 and finetunning the model in flickr8k and flickr30k datasets

Language: Jupyter Notebook - Size: 49.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

angeligareta/image-captioning

Image Caption Generator implemented using Tensorflow and Keras in a Python Jupyter Notebook. The goal is to describe the content of an image by using a CNN and RNN.

Language: Jupyter Notebook - Size: 393 KB - Last synced at: 12 days ago - Pushed at: about 4 years ago - Stars: 31 - Forks: 12

scopeInfinity/Video2Description

Video to Text: Natural language description generator for some given video. [Video Captioning]

Language: Python - Size: 33 MB - Last synced at: 12 days ago - Pushed at: almost 3 years ago - Stars: 343 - Forks: 70

nssharmaofficial/image-caption-generator

Image captioning model with Resnet50 encoder and LSTM decoder

Language: Python - Size: 745 MB - Last synced at: 14 days ago - Pushed at: 7 months ago - Stars: 17 - Forks: 4

NVlabs/prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Language: Python - Size: 4.25 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 1,310 - Forks: 73

Aldenhovel/bleu-rouge-meteor-cider-spice-eval4imagecaption

Evaluation tools for image captioning. Including BLEU, ROUGE-L, CIDEr, METEOR, SPICE scores.

Language: Python - Size: 86.8 MB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 28 - Forks: 2

google/imageinwords

Data release for the ImageInWords (IIW) paper.

Language: JavaScript - Size: 21.4 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 209 - Forks: 9

hasnainroopawalla/Image-Captioning-Scene-Descriptor

A CNN-LSTM model to generate a sentence/caption that describes the contents/scene of an image.

Language: Jupyter Notebook - Size: 259 MB - Last synced at: 13 days ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 2

IEEE-NITK/Image_Captioning

Image Captioning is the process of generating textual description of an image. It uses both Natural Language Processing and Computer Vision to generate the captions.

Language: Jupyter Notebook - Size: 9.46 MB - Last synced at: 17 days ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 8

aimagelab/show-control-and-tell

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019

Language: Python - Size: 1.71 MB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 281 - Forks: 61

milaan9/Deep_Learning_Algorithms_from_Scratch

This repository explores the variety of techniques and algorithms commonly used in deep learning and the implementation in MATLAB and PYTHON

Language: Jupyter Notebook - Size: 9.85 MB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 173 - Forks: 171

YehLi/xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

Language: Python - Size: 12.2 MB - Last synced at: 6 days ago - Pushed at: about 2 years ago - Stars: 970 - Forks: 105

ProGamerGov/VLM-Captioning-Tools

Python scripts to use for captioning images with VLMs

Language: Python - Size: 21.5 KB - Last synced at: 7 days ago - Pushed at: 9 months ago - Stars: 39 - Forks: 0

chunhuizng/mllm-video-captioner

We use RL to train a SOTA MLLM captioner.

Language: Python - Size: 1.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

shaadclt/Fine-tune-PaliGemma-Image-Captioning

This project demonstrates how to fine-tune PaliGemma model for image captioning. The PaliGemma model, developed by Google Research, is designed to handle images and generate corresponding captions.

Language: Jupyter Notebook - Size: 408 KB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 6 - Forks: 0

Pavansomisetty21/Image-Caption-Generation-using-LLMs-GEMINI-

we generate captions to the images which are given by user(user input) using prompt engineering and Generative AI

Language: Jupyter Notebook - Size: 366 KB - Last synced at: 20 days ago - Pushed at: 8 months ago - Stars: 7 - Forks: 1

chisngooo/Multimodal-Video-Retrieval-Engine-with-Vision-and-Text-by-NaiveNotNaice Fork of Zhennor/Multimodal-Video-Retrieval-Engine-with-Vision-and-Text

The video search engine, created by Team NaiveNotNice for HCM AI Challenge 2024, combines OCR, ASR, CLIP, Image Captioning, and Object & Color Detection for accurate video retrieval based on text, speech, images, objects, and colors.

Size: 20.9 GB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

jiasenlu/AdaptiveAttention

Implementation of "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning"

Language: Jupyter Notebook - Size: 3.75 MB - Last synced at: 10 days ago - Pushed at: over 7 years ago - Stars: 335 - Forks: 74

FirstLanguage/streamlit-firstlanguage

Streamlit components for FirstLanguage API

Language: Python - Size: 809 KB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 4

Bjarten/computer-vision-ND

Projects and exercises for the Udacity Computer Vision Nanodegree

Language: Jupyter Notebook - Size: 690 MB - Last synced at: 13 days ago - Pushed at: about 6 years ago - Stars: 99 - Forks: 44

sitamgithub-MSIT/paligemma2-docci-litserve

Leverage PaliGemma 2's DOCCI fine-tuned variant capabilities using LitServe.

Language: Python - Size: 468 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

rahul-vinay/ShowAttendTell

This project implements an adaptive attention mechanism for image captioning, inspired by 'Show, Attend and Tell' paper. It combines ResNet50 and LSTM with a sentinel gate to dynamically balance focus between visual features and language context.

Language: Jupyter Notebook - Size: 6.61 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

jhc13/taggui

Tag manager and captioner for image datasets

Language: Python - Size: 22.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 891 - Forks: 41

brayevalerien/ReCap

An image (re)captioning GUI for image generation models dataset preparation, made for easy caption editing.

Language: Python - Size: 2.45 MB - Last synced at: 14 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

X-PLUG/mPLUG

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)

Language: Python - Size: 1.56 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 89 - Forks: 7

hk-kumawat/Insight-Lens

📸 An AI-powered tool for intelligent image analysis with captioning, summaries, and Q&A capabilities!

Language: Python - Size: 29.3 KB - Last synced at: 26 days ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

ChaitanyaC22/Udacity-CVND-Project2-Automated-Image-Captioning

This project aims at training a CNN-RNN model to predict captions for a given image. The main task is to implement an effective RNN decoder for a CNN encoder.

Language: HTML - Size: 223 MB - Last synced at: 23 days ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 0

bushraqurban/Captionator

AI-powered image scraper and captioning tool.

Language: Python - Size: 508 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

LavanyaAN21/Depiction-of-image-features-with-audio-to-aid-visually-impaired-person

This project leverages advanced AI models to generate captions for images and translate them into regional languages (Kannada and Hindi). Additionally, it offers text-to-speech conversion, making it accessible to a wider audience, specially those with visual impairments.

Language: Python - Size: 9.77 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

dinhanhx/vcc

Vietnamese Conceptual Caption

Language: Python - Size: 26.4 KB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

li-xirong/coco-cn

Enriching MS-COCO with Chinese sentences and tags for cross-lingual multimedia tasks

Language: OpenEdge ABL - Size: 195 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 187 - Forks: 21

sreyash1mohanty/Image-captioning

Image captioning model using Keras

Language: Jupyter Notebook - Size: 19.8 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

Roni7128/NTU-2024Fall-DLCV

CommE5052: Deep Learning for Computer Vision (Prof. Frank Wang)

Size: 1.95 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Zuellni/Qt-Caption

Language: Python - Size: 2.18 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

ammarlodhi255/image-captioning-system-to-assist-the-blind

An image captioning system that is able to predict and speak out a caption of an image taken by visually impaired.

Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: 20 days ago - Pushed at: 8 months ago - Stars: 9 - Forks: 7

cstsunfu/dlk

A PyTorch Based Deep Learning Quick Develop Framework. One-Stop for train/predict/server/demo

Language: Python - Size: 9.42 MB - Last synced at: 5 days ago - Pushed at: 2 months ago - Stars: 24 - Forks: 0

MiteshPuthran/Image-Caption-Generator

The LSTM model generates captions for the input images after extracting features from pre-trained VGG-16 model. (Computer Vision, NLP, Deep Learning, Python)

Language: Jupyter Notebook - Size: 69.8 MB - Last synced at: 5 days ago - Pushed at: over 5 years ago - Stars: 86 - Forks: 32

john-fante/john-fante

In my code portfolio, I generally try new techniques and methods in machine learning. I don't like only copying and pasting.

Size: 318 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

wocns1457/CCTV-based-clothing-analysis-and-search-system

Language: Python - Size: 23.1 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

sitamgithub-MSIT/paligemma-docci

Image Captioning with PaliGemma 2 Vision Language Model.

Language: Python - Size: 1.26 MB - Last synced at: 29 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

aehrc/imageclefmedical_caption_23

MedICap: Code for the participation of team CSIRO at the ImageCLEFmedical Caption task of 2023.

Language: Jupyter Notebook - Size: 643 KB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

raj-tyagi/4CLIP-Image-Captioning

This repository presents 4CLIP, a novel approach to image captioning that enhances traditional models by dividing images into four quadrants and processing them individually. By leveraging a pretrained ViT-GPT2 model from Hugging Face, 4CLIP generates more detailed and comprehensive captions, making it suitable for fine-grained visual tasks.

Language: Python - Size: 288 KB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

terry-r123/Awesome-Captioning

A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)

Size: 56.6 KB - Last synced at: 9 days ago - Pushed at: almost 3 years ago - Stars: 110 - Forks: 10

peteanderson80/coco-caption

Adds SPICE metric to coco-caption evaluation server codes

Language: Jupyter Notebook - Size: 121 MB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 49 - Forks: 42

markdtw/soft-attention-image-captioning

tensorflow implementation of show, attend and tell (ICML'15)

Language: Python - Size: 639 KB - Last synced at: 10 days ago - Pushed at: almost 8 years ago - Stars: 19 - Forks: 11

Nexdata-AI/300-million-pairs-of-high-quality-image-caption-dataset

Size: 1000 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Mahmood-Anaam/arabic-visual-question-answering

Modular and extensible framework for Arabic Visual Question Answering (VQA) using state-of-the-art pretrained models for image captioning and question answering.

Language: Jupyter Notebook - Size: 3.75 MB - Last synced at: 16 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Snigdho8869/AI-Generative-Models-Notebooks-DCGAN-VAE-Autoencoder

This repository contains notebooks showcasing various generative models, including DCGAN and VAE for anime face generation, an Autoencoder for converting photos to sketches, a captioning model using an attention mechanism for an image caption generator, and more.

Language: Jupyter Notebook - Size: 21.7 MB - Last synced at: 22 days ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 1

berlin0308/NTU-2024Fall-DLCV

CommE5052: Deep Learning for Computer Vision (Prof. Frank Wang)

Language: Jupyter Notebook - Size: 41.6 MB - Last synced at: 28 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

MuhammadHadiofficial/urdu_caption_generator

This repository contains the implementation of a Transformer-based model for Urdu Image Caption Generation, presented in the study "A Transformer-based Urdu Image Caption Generation." The project aims to generate syntactically, contextually, and semantically correct captions in Urdu for given images. It addresses the challenges of working with low-

Size: 4.88 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

santoshlite/ByteDetective

The easiest way to search for images on your desktop 🔎

Language: Rust - Size: 3.87 MB - Last synced at: 10 days ago - Pushed at: almost 2 years ago - Stars: 30 - Forks: 2

kacky24/stylenet

A pytorch implemention of "StyleNet: Generating Attractive Visual Captions with Styles"

Language: Python - Size: 13.2 MB - Last synced at: 6 days ago - Pushed at: over 4 years ago - Stars: 62 - Forks: 10

adityajn105/image-caption-bot

Implementation of 'merge' architecture for generating image captions from paper "What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?" using Keras. Dataset used is Flickr8k available on Kaggle.

Language: Jupyter Notebook - Size: 10.6 MB - Last synced at: 6 days ago - Pushed at: over 5 years ago - Stars: 16 - Forks: 14

mmahdin/CI_CNNProject_Fall2024

image classification on CIFAR-10 with ResNet, medical image analysis on breast histopathology images using CNNs, and image captioning on Flickr8k, Flickr30k, and MSCOCO datasets with advanced architectures like LSTM and attention mechanisms.

Language: Jupyter Notebook - Size: 333 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

gauthiii/fineTunedBLIP

Fine Tuned the model BLIP to accurately caption images of Tom and Jerry.

Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Related Keywords
image-captioning 786 deep-learning 234 pytorch 173 computer-vision 152 lstm 118 tensorflow 88 cnn 87 machine-learning 85 python 81 nlp 75 rnn 68 keras 61 natural-language-processing 48 convolutional-neural-networks 44 transformer 43 attention-mechanism 39 neural-networks 38 lstm-neural-networks 34 object-detection 34 image-processing 34 recurrent-neural-networks 34 neural-network 30 image-classification 30 encoder-decoder 30 transformers 28 image-caption-generator 28 python3 26 deep-neural-networks 25 captioning-images 24 artificial-intelligence 22 attention 22 ai 21 image-to-text 20 flask 19 caption-generation 19 vgg16 18 multimodal 18 flickr8k-dataset 18 clip 17 visual-question-answering 17 image-caption 17 resnet-50 16 inceptionv3 16 generative-ai 16 llm 15 keras-tensorflow 15 show-attend-and-tell 15 beam-search 14 blip 14 transfer-learning 14 bleu-score 14 huggingface 14 image-recognition 14 mscoco-dataset 13 resnet 13 image 13 image-generation 13 ocr 12 multimodal-learning 12 vqa 12 mscoco 12 vision-and-language 12 image-segmentation 11 vision-transformer 11 coco-dataset 11 face-recognition 11 dataset 11 streamlit 11 tensorflow2 10 reinforcement-learning 10 show-and-tell 10 attention-model 10 vision-language 10 jupyter-notebook 10 video-captioning 9 nlp-machine-learning 9 opencv 9 stable-diffusion 9 encoder 9 huggingface-transformers 9 face-detection 9 gan 9 decoder 8 coco 8 data-science 8 docker 8 machine-translation 8 inception-v3 8 word-embeddings 8 gpt-2 8 cnn-keras 8 text-to-image 8 captioning 8 encoder-decoder-model 8 text-generation 7 torch 7 deeplearning 7 style-transfer 7 vision-language-model 7 multimodal-deep-learning 7