An open API service providing repository metadata for many open source software ecosystems.

Topic: "gptq"

intel/neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Language: Python - Size: 469 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2,400 - Forks: 267

ModelCloud/GPTQModel

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Language: Python - Size: 12 MB - Last synced at: 3 days ago - Pushed at: 7 days ago - Stars: 540 - Forks: 77

intel/auto-round

Advanced Quantization Algorithm for LLMs/VLMs.

Language: Python - Size: 10.6 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 457 - Forks: 37

shm007g/LLaMA-Cult-and-More

Large Language Models for All, 🦙 Cult and More, Stay in touch !

Language: HTML - Size: 566 KB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 446 - Forks: 24

bobazooba/xllm

🦖 X—LLM: Cutting Edge & Easy LLM Finetuning

Language: Python - Size: 1.81 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 399 - Forks: 21

1b5d/llm-api

Run any Large Language Model behind a unified API

Language: Python - Size: 53.7 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 169 - Forks: 27

chenhunghan/ialacol 📦

🪶 Lightweight OpenAI drop-in replacement for Kubernetes

Language: Python - Size: 250 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 144 - Forks: 17

abhinand5/gptq_for_langchain

A guide about how to use GPTQ models with langchain

Language: Jupyter Notebook - Size: 41 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 39 - Forks: 9

ziwang-com/zero-lora

zero零训练llm调参

Size: 20.8 MB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 31 - Forks: 3

hcd233/Aris-AI-Model-Server

An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API

Language: Python - Size: 1.05 MB - Last synced at: 8 days ago - Pushed at: 28 days ago - Stars: 14 - Forks: 1

seyf1elislam/LocalLLM_OneClick_Colab

Run gguf LLM models in Latest Version TextGen-webui

Language: Jupyter Notebook - Size: 102 KB - Last synced at: 2 days ago - Pushed at: 7 months ago - Stars: 10 - Forks: 0

matlok-ai/bampe-weights

This repository is for profiling, extracting, visualizing and reusing generative AI weights to hopefully build more accurate AI models and audit/scan weights at rest to identify knowledge domains for risk(s).

Language: Python - Size: 96.7 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 0

Aqirito/A.L.I.C.E

A.L.I.C.E (Artificial Labile Intelligence Cybernated Existence). A REST API of A.I companion for creating more complex system

Language: Python - Size: 22.2 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 7 - Forks: 0

tripathiarpan20/self-improvement-4all

Private self-improvement coaching with open-source LLMs

Language: Python - Size: 2.89 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

bobazooba/shurale

Conversation AI model for open domain dialogs

Language: Python - Size: 64.5 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 1

amajji/LLM-Quantization-Techniques-Absmax-Zeropoint-GPTQ-GGUF

LLM quantization techniques: absmax, zero-point, GPTQ and GGUF

Language: Jupyter Notebook - Size: 182 KB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 1

ldilov/IntelliBridge

IntelliBridge is an advanced semi-autonomous chatbot designed to empower users with the power of cutting-edge natural language processing technology. Harnessing the capabilities of large language models such as GPT and LLAMA, IntelliBridge provides an intuitive user interface and API for seamless interaction with these models.

Language: Python - Size: 16.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

STiFLeR7/Edge-LLM

Optimized Qwen2.5-3B using GPTQ, reducing size from 5.75GB → 1.93GB and improving inference speed. Ideal for efficient edge AI deployments.

Language: Python - Size: 22.5 KB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

lpalbou/model-quantizer

Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.

Language: Python - Size: 165 KB - Last synced at: 30 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

ElDokmak/LLMs-variety

Hands on some LLMs

Language: Jupyter Notebook - Size: 546 KB - Last synced at: 5 days ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

SujanNeupane42/NEPSE-Chatbot-Using-Retrieval-augmented-generation-and-reranking

This project will develop a NEPSE chatbot using an open-source LLM, incorporating sentence transformers, vector database and reranking.

Language: Jupyter Notebook - Size: 9.16 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

SujanNeupane42/LLM_Quantization

Quantizing LLMs using GPTQ

Language: Jupyter Notebook - Size: 58.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0