Topic: "gpt4v"
TencentQQGYLab/AppAgent
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
Language: Python - Size: 2.83 MB - Last synced at: about 7 hours ago - Pushed at: about 1 month ago - Stars: 5,762 - Forks: 636

X-PLUG/MobileAgent
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Language: Python - Size: 383 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 4,034 - Forks: 404

AmberSahdev/Open-Interface
Control Any Computer Using LLMs.
Language: Python - Size: 142 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 2,060 - Forks: 202

reworkd/tarsier
Vision utilities for web interaction agents 👀
Language: Jupyter Notebook - Size: 2.94 GB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 1,635 - Forks: 103

ictnlp/LLaVA-Mini
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Language: Python - Size: 54.6 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 441 - Forks: 19

langgptai/Awesome-Multimodal-Prompts
Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.
Size: 87.3 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 249 - Forks: 16

ShareGPT4Omni/ShareGPT4V
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
Language: Python - Size: 644 KB - Last synced at: 18 days ago - Pushed at: 10 months ago - Stars: 211 - Forks: 5

pAIrprogio/vscode-ui-sketcher
Draw your projects to life
Language: TypeScript - Size: 1.58 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 160 - Forks: 8

soulteary/amazing-openai-api
Convert different model APIs into the OpenAI API format out of the box.
Language: Go - Size: 463 KB - Last synced at: 17 days ago - Pushed at: about 1 year ago - Stars: 149 - Forks: 13

zzxslp/MM-Navigator
GPT-4V in Wonderland: LMMs as Smartphone Agents
Language: Python - Size: 28.4 MB - Last synced at: 5 months ago - Pushed at: 9 months ago - Stars: 128 - Forks: 2

bdekraker/WebcamGPT-Vision
Lightweight GPT-4 Vision processing over the Webcam
Language: JavaScript - Size: 34.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 127 - Forks: 15

kyegomez/MambaByte
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
Language: Python - Size: 2.16 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 115 - Forks: 7

BUAADreamer/Chinese-LLaVA-Med
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
Language: Python - Size: 2.26 MB - Last synced at: 22 days ago - Pushed at: 11 months ago - Stars: 76 - Forks: 4

cameronking4/sketch2app
The ultimate sketch to code app made using GPT4 vision. Choose your desired framework (React, Next, React Native, Flutter) for your app. It will instantly generate code and preview (sandbox) from a simple hand drawn sketch on paper captured from webcam
Language: JavaScript - Size: 73.4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 38 - Forks: 8

roboflow/gpt-checkup
Monitor the performance of OpenAI's GPT O3 Mini model over time.
Language: HTML - Size: 22.2 MB - Last synced at: about 23 hours ago - Pushed at: 10 days ago - Stars: 34 - Forks: 5

martintomov/gpt4v-video-voiceover
Video Voiceover with gpt-4o-mini
Language: Jupyter Notebook - Size: 5.5 MB - Last synced at: 3 days ago - Pushed at: 7 months ago - Stars: 33 - Forks: 8

reidbarber/webmarker
Mark web pages for use with vision-language models
Language: TypeScript - Size: 677 KB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 30 - Forks: 3

neka-nat/mylangrobot
Language instructions to mycobot using GPT-4V
Language: Python - Size: 3.52 MB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 23 - Forks: 0

admineral/GPT4-Vision-React-Starter
Early Alpha Release: Chat with Your Image - Leveraging GPT-4 Vision and Function Calls for AI-Powered Image Analysis and Description
Language: TypeScript - Size: 256 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 21 - Forks: 18

kyegomez/HRTX
Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2
Language: Python - Size: 2.19 MB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 16 - Forks: 3

Charmve/gpt-eyes
I GAVE GPT-4 EYES!
Language: JavaScript - Size: 13.8 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 4

Azure-Samples/rag-as-a-service-with-vision
This repository offers a Python framework for a retrieval-augmented generation (RAG) pipeline using text and images from MHTML documents, leveraging Azure AI and OpenAI services. It includes ingestion and enrichment flows, a RAG with Vision pipeline, and evaluation tools.
Language: Python - Size: 2.37 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 11 - Forks: 2

GraphPKU/CoI
Chain of Images for Intuitively Reasoning
Language: Python - Size: 5.17 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

easonlai/webcam_chat_with_aoai_gpt4o
Discover the GPT-4o multimodal model at Microsoft Build 2024, now with text and image capabilities. My prototype enhances chats with real-time camera snapshots, powered by Flask, OpenCV, and Azure’s OpenAI Services. It’s interactive, visual, and simple to use. Give it a try!
Language: HTML - Size: 2.03 MB - Last synced at: 22 days ago - Pushed at: 11 months ago - Stars: 7 - Forks: 2

elizabethsiegle/stephensmithify-openaivision-sendgrid
Analyze a Video and generate commentary about it with OpenAI's GPT-4V, Text-to-speech, LangChain, Streamlit, Replit, Twilio SendGrid, and OpenCV!
Language: Python - Size: 199 MB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

dceluis/vacocam_render
Vision-Assisted Camera Orientation
Language: Jupyter Notebook - Size: 546 MB - Last synced at: 11 days ago - Pushed at: 11 months ago - Stars: 4 - Forks: 0

danomation/discord-vision
poc gpt-4 vision bot
Language: Python - Size: 6.84 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

gpt4api9/gpt4api9
麻雀GPTs-API市场
Size: 281 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

logicalroot/gpt-4v-demos
🤖 GPT-4V Demos • Test the model's vision capabilities in your browser using Streamlit • Easy setup
Language: Python - Size: 1.8 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 2

ethan-yz-hao/equation-ocr-app
OCR application for converting handwritten equations into LaTeX code using OpenAI's GPT-4V API, with LaTeX renderer for editing and checking (Next.js, Typescript, OpenAI GPT-4V, KaTex, Vercel)
Language: TypeScript - Size: 155 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

Ravi-Teja-konda/TunedLlavaDelights
Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition
Language: Python - Size: 43.3 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

sagentic-ai/cupid
Valentine's Day Cupid Agent
Language: TypeScript - Size: 39.1 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 2

Envedity/DAIA
Digital Artificial Intelligence Agent
Language: Python - Size: 3.35 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

yunwoong7/GPT-4V-Examples
Explore the power of GPT-4V with our curated examples and tutorials. This repository offers code snippets, step-by-step guides, and use case demonstrations for integrating GPT-4V into various applications. Perfect for both AI novices and experts!
Language: Jupyter Notebook - Size: 3.52 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

jamesponddotco/allalt
[READ-ONLY] Describe images and generate alt tags for visually impaired users.
Language: Go - Size: 45.9 KB - Last synced at: 6 days ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

ababiyaworku/GPT4V_Captioner
A simple & powerful GPT4V- Image captioner for images. Single or Batch process multiple images in a directory where you run the script.
Language: Python - Size: 101 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

metatatt/iso_bot
ISO 13485 Sniffer Bot, GPT4V with LlamaIndex embeded in React Bot UI
Language: TypeScript - Size: 191 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

elizabethsiegle/predict-bball-shot-sms-gpt4v
Language: JavaScript - Size: 1.63 MB - Last synced at: 30 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
