An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: gpt4v

X-PLUG/MobileAgent

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

Language: Python - Size: 386 MB - Last synced at: 1 day ago - Pushed at: 21 days ago - Stars: 4,385 - Forks: 445

reworkd/tarsier

Vision utilities for web interaction agents 👀

Language: Jupyter Notebook - Size: 2.94 GB - Last synced at: 1 day ago - Pushed at: 7 months ago - Stars: 1,694 - Forks: 107

langgptai/Awesome-Multimodal-Prompts

Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.

Size: 87.3 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 255 - Forks: 19

AmberSahdev/Open-Interface

Control Any Computer Using LLMs.

Language: Python - Size: 142 MB - Last synced at: 11 days ago - Pushed at: 3 months ago - Stars: 2,225 - Forks: 218

TencentQQGYLab/AppAgent

AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.

Language: Python - Size: 2.83 MB - Last synced at: 14 days ago - Pushed at: 3 months ago - Stars: 5,881 - Forks: 651

kyegomez/HRTX

Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2

Language: Python - Size: 2.19 MB - Last synced at: 8 days ago - Pushed at: 8 months ago - Stars: 16 - Forks: 3

reidbarber/webmarker

Mark web pages for use with vision-language models

Language: TypeScript - Size: 681 KB - Last synced at: 23 days ago - Pushed at: about 1 month ago - Stars: 40 - Forks: 3

ictnlp/LLaVA-Mini

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

Language: Python - Size: 54.6 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 479 - Forks: 22

neka-nat/mylangrobot

Language instructions to mycobot using GPT-4V

Language: Python - Size: 3.52 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 24 - Forks: 0

roboflow/gpt-checkup 📦

Monitor the performance of OpenAI's GPT O3 Mini model over time.

Language: HTML - Size: 22.2 MB - Last synced at: about 8 hours ago - Pushed at: about 1 month ago - Stars: 34 - Forks: 5

kyegomez/MambaByte

Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta

Language: Python - Size: 2.16 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 116 - Forks: 6

ShareGPT4Omni/ShareGPT4V

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

Language: Python - Size: 644 KB - Last synced at: about 2 months ago - Pushed at: 12 months ago - Stars: 217 - Forks: 6

easonlai/webcam_chat_with_aoai_gpt4o

Discover the GPT-4o multimodal model at Microsoft Build 2024, now with text and image capabilities. My prototype enhances chats with real-time camera snapshots, powered by Flask, OpenCV, and Azure’s OpenAI Services. It’s interactive, visual, and simple to use. Give it a try!

Language: HTML - Size: 2.03 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 2

cameronking4/sketch2app

The ultimate sketch to code app made using GPT4o serving 25k+ users. Choose your desired framework (React, Next, React Native, Flutter) for your app. It will instantly generate code and preview (sandbox) from a simple hand drawn sketch on paper captured from webcam

Size: 74.1 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 79 - Forks: 36

pAIrprogio/vscode-ui-sketcher

Draw your projects to life

Language: TypeScript - Size: 1.58 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 201 - Forks: 13

soulteary/amazing-openai-api

Convert different model APIs into the OpenAI API format out of the box.

Language: Go - Size: 463 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 149 - Forks: 13

GraphPKU/CoI

Chain of Images for Intuitively Reasoning

Language: Python - Size: 5.17 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

BUAADreamer/Chinese-LLaVA-Med

中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine

Language: Python - Size: 2.26 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 76 - Forks: 4

jamesponddotco/allalt

[READ-ONLY] Describe images and generate alt tags for visually impaired users.

Language: Go - Size: 45.9 KB - Last synced at: 5 days ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Charmve/gpt-eyes

I GAVE GPT-4 EYES!

Language: JavaScript - Size: 13.8 MB - Last synced at: about 11 hours ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 4

martintomov/gpt4v-video-voiceover

Video Voiceover with gpt-4o-mini

Language: Jupyter Notebook - Size: 5.5 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 33 - Forks: 8

Azure-Samples/rag-as-a-service-with-vision

This repository offers a Python framework for a retrieval-augmented generation (RAG) pipeline using text and images from MHTML documents, leveraging Azure AI and OpenAI services. It includes ingestion and enrichment flows, a RAG with Vision pipeline, and evaluation tools.

Language: Python - Size: 2.37 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 11 - Forks: 2

zzxslp/MM-Navigator

GPT-4V in Wonderland: LMMs as Smartphone Agents

Language: Python - Size: 28.4 MB - Last synced at: 7 months ago - Pushed at: 11 months ago - Stars: 128 - Forks: 2

ababiyaworku/GPT4V_Captioner

A simple & powerful GPT4V- Image captioner for images. Single or Batch process multiple images in a directory where you run the script.

Language: Python - Size: 101 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

ethan-yz-hao/equation-ocr-app

OCR application for converting handwritten equations into LaTeX code using OpenAI's GPT-4V API, with LaTeX renderer for editing and checking (Next.js, Typescript, OpenAI GPT-4V, KaTex, Vercel)

Language: TypeScript - Size: 155 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

dceluis/vacocam_render

Vision-Assisted Camera Orientation

Language: Jupyter Notebook - Size: 546 MB - Last synced at: 27 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 0

metatatt/iso_bot

ISO 13485 Sniffer Bot, GPT4V with LlamaIndex embeded in React Bot UI

Language: TypeScript - Size: 191 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

gpt4api9/gpt4api9

麻雀GPTs-API市场

Size: 281 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

Ravi-Teja-konda/TunedLlavaDelights

Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition

Language: Python - Size: 43.3 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

sagentic-ai/cupid

Valentine's Day Cupid Agent

Language: TypeScript - Size: 39.1 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 2

elizabethsiegle/stephensmithify-openaivision-sendgrid

Analyze a Video and generate commentary about it with OpenAI's GPT-4V, Text-to-speech, LangChain, Streamlit, Replit, Twilio SendGrid, and OpenCV!

Language: Python - Size: 199 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

Envedity/DAIA

Digital Artificial Intelligence Agent

Language: Python - Size: 3.35 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

elizabethsiegle/predict-bball-shot-sms-gpt4v

Language: JavaScript - Size: 1.63 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

logicalroot/gpt-4v-demos

🤖 GPT-4V Demos • Test the model's vision capabilities in your browser using Streamlit • Easy setup

Language: Python - Size: 1.8 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 2

admineral/GPT4-Vision-React-Starter

Early Alpha Release: Chat with Your Image - Leveraging GPT-4 Vision and Function Calls for AI-Powered Image Analysis and Description

Language: TypeScript - Size: 256 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 21 - Forks: 18

yunwoong7/GPT-4V-Examples

Explore the power of GPT-4V with our curated examples and tutorials. This repository offers code snippets, step-by-step guides, and use case demonstrations for integrating GPT-4V into various applications. Perfect for both AI novices and experts!

Language: Jupyter Notebook - Size: 3.52 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

bdekraker/WebcamGPT-Vision

Lightweight GPT-4 Vision processing over the Webcam

Language: JavaScript - Size: 34.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 127 - Forks: 15

danomation/discord-vision

poc gpt-4 vision bot

Language: Python - Size: 6.84 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0