An open API service providing repository metadata for many open source software ecosystems.

Topic: "image-text"

salesforce/ALBEF

Code for ALBEF: a new vision-language pre-training method

Language: Python - Size: 69.9 MB - Last synced at: 17 days ago - Pushed at: over 2 years ago - Stars: 1,627 - Forks: 205

Sense-GVT/DeCLIP

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Language: Python - Size: 970 KB - Last synced at: 21 days ago - Pushed at: over 2 years ago - Stars: 649 - Forks: 32

google/imageinwords

Data release for the ImageInWords (IIW) paper.

Language: JavaScript - Size: 21.4 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 209 - Forks: 9

X-PLUG/mPLUG

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)

Language: Python - Size: 1.56 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 89 - Forks: 7

labyrinth7x/Deep-Cross-Modal-Projection-Learning-for-Image-Text-Matching

Deep Cross-Modal Projection Learning for Image-Text Matching

Language: Python - Size: 39.2 MB - Last synced at: 8 months ago - Pushed at: over 4 years ago - Stars: 72 - Forks: 21

glami/glami-1m

The largest multilingual image-text classification dataset. It contains fashion products.

Language: Jupyter Notebook - Size: 5.43 MB - Last synced at: 4 days ago - Pushed at: almost 2 years ago - Stars: 71 - Forks: 7

miccunifi/QualiCLIP

Quality-Aware Image-Text Alignment for Real-World Image Quality Assessment

Language: Python - Size: 5.37 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 60 - Forks: 1

TheoCoombes/crawlingathome

A client library for LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.

Language: Python - Size: 138 KB - Last synced at: 21 days ago - Pushed at: about 2 years ago - Stars: 33 - Forks: 7

antonlukin/poster-editor

Wrapper for PHP's GD Library for easy image manipulation. Support for scaling multi-line text, shapes, filters and smart resize.

Language: PHP - Size: 12.2 MB - Last synced at: 13 days ago - Pushed at: 3 months ago - Stars: 20 - Forks: 2

zhangming8/ocr_algo_server

ocr文字识别算法服务

Language: C++ - Size: 5.89 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 17 - Forks: 5

HuangRunHua/LiveTextWithImage

WWDC22: Enabling Live Text interactions with images in SwiftUI

Language: Swift - Size: 2.71 MB - Last synced at: 4 days ago - Pushed at: almost 3 years ago - Stars: 14 - Forks: 1

zabir-nabil/imagebert-keras

Keras implementation of ImageBERT from Microsoft

Size: 2.93 KB - Last synced at: 22 days ago - Pushed at: about 5 years ago - Stars: 14 - Forks: 2

TheoCoombes/crawlingathome-server

A server powering LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.

Language: Python - Size: 1.32 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 9 - Forks: 4

Thisisus7/ING-VP

An Interactive Game-based Vision Planning benchmark

Language: Python - Size: 2.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 0

waysup/Img2Txt

Img2Txt - A Dropzone 3 action which recognizes texts in images(jpg/png) using baidu OCR API

Language: Python - Size: 23.4 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 6 - Forks: 0

reshalfahsi/image-captioning-mobilenet-llama3

Image Captioning With MobileNet-LLaMA 3

Language: Jupyter Notebook - Size: 3.56 MB - Last synced at: 12 days ago - Pushed at: 10 months ago - Stars: 5 - Forks: 0

leeyunjai/image2text

caption generator using lavis and argostranslate

Language: Python - Size: 128 KB - Last synced at: 14 days ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 1

dinhanhx/VisualRoBERTa

The first public Vietnamese visual linguistic foundation model(s)

Language: Python - Size: 98.6 KB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 2

dngo-io/cover-creator

Write texts on images with php

Language: PHP - Size: 313 KB - Last synced at: about 2 months ago - Pushed at: almost 7 years ago - Stars: 3 - Forks: 4

fatemeh-mohseni-AI/most-repeated-vocabulary-IELTS

This project is a FastAPI-based web application designed to analyze C a m b r i d g e I E L T S P D F s ( B o o k s 1 − 18 ) for the most and least repeated words. It can handle both regular text-based PDFs and scanned image-based PDFs by converting them to images and extracting text using OCR (Optical Character Recognition).

Language: Python - Size: 1.28 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 1

awsaf49/flickr-dataset

Download flickr8k, flickr30k image caption datasets

Size: 4.88 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

jianzhnie/MultimodalTransformers

lmmtoolkit is a toolkit for Multi-Modal Learning

Language: Python - Size: 22.5 KB - Last synced at: 17 days ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

formulae-org/package-graphic-raster-js

Raster graphics package for Fōrmulæ, in JavaScript

Language: JavaScript - Size: 82 KB - Last synced at: 22 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

dvlab-research/TagCLIP

Language: Python - Size: 2.43 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

CharlesYang030/MTA

MTA: A Lightweight Multilingual Text Alignment Model for Cross-language Visual Word Sense Disambiguation

Language: Jupyter Notebook - Size: 10.2 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

CharlesYang030/FCLL

FCLL: A Fine-grained Contrastive Language-Image Learning Model

Language: Jupyter Notebook - Size: 5.51 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

dinhanhx/VL-datasets

Some Python scripts to load Vietnamese visual linguistic data

Language: Python - Size: 1.95 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

waittim/ConVIRT-Colab Fork of edreisMD/ConVIRT-pytorch

Contrastive Learning Representations for Images and Text Pairs. Colab implementation of ConVIRT for transfer learning with insufficient data volume.

Language: Jupyter Notebook - Size: 13.6 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

Nexdata-AI/10000-Image-caption-data-of-diverse-scenes

10000-Image-caption-data-of-diverse-scenes

Size: 2.44 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/10000-Image-caption-data-of-vehicles

10000-Image-caption-data-of-vehicles

Size: 1.12 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/10100-Image-caption-data-of-human-face

10100-Image-caption-data-of-human-face

Size: 1.57 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/10000-Image-caption-data-of-gestures

10000-Image-caption-data-of-gestures

Size: 1.35 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/20011--Image-Caption-Data-Of-OCR-In-Natural-Scenes

20011--Image-Caption-Data-Of-OCR-In-Natural-Scenes

Size: 1.38 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/11000-Image-Video-caption-data-of-human-action

11000-Image-Video-caption-data-of-human-action

Size: 2.5 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

CharlesYang030/PolCLIP

PolCLIP: A Unified Image-Text Word Sense Disambiguation Model via Generating Multimodal Complementary Representations

Language: Jupyter Notebook - Size: 16.9 MB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

AkshayBura/Character-Recognition

Character Recognition system using CNN and Streamlit

Language: Jupyter Notebook - Size: 765 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ask0ne/ocrator

Scan text from an image and convert into speech/audio of desired language.

Language: Python - Size: 52.7 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

ppraneeth270/img2text

Language: Python - Size: 58.6 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

xiongshufeng/MTFN-RR-PyTorch-Code Fork of Wangt-CN/MTFN-RR-PyTorch-Code

The offical code for paper "Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking", ACM Multimedia 2019 Oral

Size: 4.54 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

makefile/text_extraction Fork of lluisgomez/text_extraction

Windows version of text_extraction(VS2013). This code is the implementation of the method proposed in the paper “Multi-script text extraction from natural scenes” (Gomez & Karatzas) to appear in ICDAR2013 conference.

Language: C++ - Size: 1.38 MB - Last synced at: 12 months ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

Related Topics
clip 6 generative-ai 6 caption-data 6 multimodal 5 image-captioning 5 dataset 5 image-processing 5 python 4 image-recognition 4 ocr 3 text-image 3 contrastive-learning 3 pytorch 3 multilingual 3 dataset-generation 3 computer-vision 3 visual-linguistic 2 vietnamese-nlp 2 cnn 2 deep-learning 2 dall-e 2 language-vision 2 zero-shot 2 python-3 2 visualwsd 2 captioning-images 2 python3 2 image-to-text 2 visual-question-answering 2 transformer 2 natural-language-processing 2 machine-learning 2 php 2 vietnamese 1 big-model 1 multi-model 1 self-supervised 1 vision-language-pretraining 1 representation-learning 1 vision-and-language 1 weakly-supervised-learning 1 segmentation 1 vehicle-detection 1 multimodal-wsd 1 formulae 1 graphic-primitives 1 graphics 1 graphics-programming 1 image-colors 1 image-coordinates 1 image-transformations 1 javascript 1 raster-graphics 1 rotating 1 stroke-imaging 1 turtle-graphics 1 xor-mode 1 imagebert 1 keras 1 php-library 1 poster-editor 1 detailed-annotations 1 detailed-descriptions 1 evaluation 1 human-annotation 1 i2t 1 image-descriptions 1 t2i 1 flickr8k-dataset 1 grouped-query-attention 1 kv-cache 1 llama3 1 mobilenetv3 1 nlp 1 pytorch-lightning 1 rms-norm 1 rotary-position-embedding 1 multi-modal-learning 1 text-to-video 1 image-text-retrieval 1 pretraining 1 visual-language 1 vqa 1 blip2 1 caption 1 caption-generation 1 caption-generator 1 captions 1 image-analysis 1 img2txt 1 composer 1 intervention 1 php-class 1 php-gd 1 php-image 1 ielts 1 fine-grained 1 sense-autocomplement 1 biqa 1 blind-image-quality-assessment 1