gpt4v | Topic | Ecosyste.ms: Repos

Topic: "gpt4v"

TencentQQGYLab/AppAgent

AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.

Language: Python - Size: 2.83 MB - Last synced at: about 7 hours ago - Pushed at: about 1 month ago - Stars: 5,762 - Forks: 636

X-PLUG/MobileAgent

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

Language: Python - Size: 383 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 4,034 - Forks: 404

AmberSahdev/Open-Interface

Control Any Computer Using LLMs.

Language: Python - Size: 142 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 2,060 - Forks: 202

reworkd/tarsier

Vision utilities for web interaction agents 👀

Language: Jupyter Notebook - Size: 2.94 GB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 1,635 - Forks: 103

ictnlp/LLaVA-Mini

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

Language: Python - Size: 54.6 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 441 - Forks: 19

langgptai/Awesome-Multimodal-Prompts

Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.

Size: 87.3 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 249 - Forks: 16

ShareGPT4Omni/ShareGPT4V

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

Language: Python - Size: 644 KB - Last synced at: 18 days ago - Pushed at: 10 months ago - Stars: 211 - Forks: 5

pAIrprogio/vscode-ui-sketcher

Draw your projects to life

Language: TypeScript - Size: 1.58 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 160 - Forks: 8

soulteary/amazing-openai-api

Convert different model APIs into the OpenAI API format out of the box.

Language: Go - Size: 463 KB - Last synced at: 17 days ago - Pushed at: about 1 year ago - Stars: 149 - Forks: 13

zzxslp/MM-Navigator

GPT-4V in Wonderland: LMMs as Smartphone Agents

Language: Python - Size: 28.4 MB - Last synced at: 5 months ago - Pushed at: 9 months ago - Stars: 128 - Forks: 2

bdekraker/WebcamGPT-Vision

Lightweight GPT-4 Vision processing over the Webcam

Language: JavaScript - Size: 34.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 127 - Forks: 15

kyegomez/MambaByte

Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta

Language: Python - Size: 2.16 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 115 - Forks: 7

BUAADreamer/Chinese-LLaVA-Med

中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine

Language: Python - Size: 2.26 MB - Last synced at: 22 days ago - Pushed at: 11 months ago - Stars: 76 - Forks: 4

The ultimate sketch to code app made using GPT4 vision. Choose your desired framework (React, Next, React Native, Flutter) for your app. It will instantly generate code and preview (sandbox) from a simple hand drawn sketch on paper captured from webcam

Language: JavaScript - Size: 73.4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 38 - Forks: 8

roboflow/gpt-checkup

Monitor the performance of OpenAI's GPT O3 Mini model over time.

Language: HTML - Size: 22.2 MB - Last synced at: about 23 hours ago - Pushed at: 10 days ago - Stars: 34 - Forks: 5

martintomov/gpt4v-video-voiceover

Video Voiceover with gpt-4o-mini

Language: Jupyter Notebook - Size: 5.5 MB - Last synced at: 3 days ago - Pushed at: 7 months ago - Stars: 33 - Forks: 8

reidbarber/webmarker

Mark web pages for use with vision-language models

Language: TypeScript - Size: 677 KB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 30 - Forks: 3

neka-nat/mylangrobot

Language instructions to mycobot using GPT-4V

Language: Python - Size: 3.52 MB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 23 - Forks: 0

admineral/GPT4-Vision-React-Starter

Early Alpha Release: Chat with Your Image - Leveraging GPT-4 Vision and Function Calls for AI-Powered Image Analysis and Description

Language: TypeScript - Size: 256 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 21 - Forks: 18

kyegomez/HRTX

Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2

Language: Python - Size: 2.19 MB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 16 - Forks: 3

Charmve/gpt-eyes

I GAVE GPT-4 EYES!

Language: JavaScript - Size: 13.8 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 4

Azure-Samples/rag-as-a-service-with-vision

This repository offers a Python framework for a retrieval-augmented generation (RAG) pipeline using text and images from MHTML documents, leveraging Azure AI and OpenAI services. It includes ingestion and enrichment flows, a RAG with Vision pipeline, and evaluation tools.

Language: Python - Size: 2.37 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 11 - Forks: 2

GraphPKU/CoI

Chain of Images for Intuitively Reasoning

Language: Python - Size: 5.17 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

easonlai/webcam_chat_with_aoai_gpt4o

Discover the GPT-4o multimodal model at Microsoft Build 2024, now with text and image capabilities. My prototype enhances chats with real-time camera snapshots, powered by Flask, OpenCV, and Azure’s OpenAI Services. It’s interactive, visual, and simple to use. Give it a try!

Language: HTML - Size: 2.03 MB - Last synced at: 22 days ago - Pushed at: 11 months ago - Stars: 7 - Forks: 2

elizabethsiegle/stephensmithify-openaivision-sendgrid

Analyze a Video and generate commentary about it with OpenAI's GPT-4V, Text-to-speech, LangChain, Streamlit, Replit, Twilio SendGrid, and OpenCV!

Language: Python - Size: 199 MB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

dceluis/vacocam_render

Vision-Assisted Camera Orientation

Language: Jupyter Notebook - Size: 546 MB - Last synced at: 11 days ago - Pushed at: 11 months ago - Stars: 4 - Forks: 0

danomation/discord-vision

poc gpt-4 vision bot

Language: Python - Size: 6.84 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

gpt4api9/gpt4api9

麻雀GPTs-API市场

Size: 281 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

logicalroot/gpt-4v-demos

🤖 GPT-4V Demos • Test the model's vision capabilities in your browser using Streamlit • Easy setup

Language: Python - Size: 1.8 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 2

ethan-yz-hao/equation-ocr-app

OCR application for converting handwritten equations into LaTeX code using OpenAI's GPT-4V API, with LaTeX renderer for editing and checking (Next.js, Typescript, OpenAI GPT-4V, KaTex, Vercel)

Language: TypeScript - Size: 155 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

Ravi-Teja-konda/TunedLlavaDelights

Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition

Language: Python - Size: 43.3 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0