Topic: "gptq"
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Language: Python - Size: 469 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2,400 - Forks: 267

ModelCloud/GPTQModel
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Language: Python - Size: 12 MB - Last synced at: 3 days ago - Pushed at: 7 days ago - Stars: 540 - Forks: 77

intel/auto-round
Advanced Quantization Algorithm for LLMs/VLMs.
Language: Python - Size: 10.6 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 457 - Forks: 37

shm007g/LLaMA-Cult-and-More
Large Language Models for All, 🦙 Cult and More, Stay in touch !
Language: HTML - Size: 566 KB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 446 - Forks: 24

bobazooba/xllm
🦖 X—LLM: Cutting Edge & Easy LLM Finetuning
Language: Python - Size: 1.81 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 399 - Forks: 21

1b5d/llm-api
Run any Large Language Model behind a unified API
Language: Python - Size: 53.7 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 169 - Forks: 27

chenhunghan/ialacol 📦
🪶 Lightweight OpenAI drop-in replacement for Kubernetes
Language: Python - Size: 250 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 144 - Forks: 17

abhinand5/gptq_for_langchain
A guide about how to use GPTQ models with langchain
Language: Jupyter Notebook - Size: 41 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 39 - Forks: 9

ziwang-com/zero-lora
zero零训练llm调参
Size: 20.8 MB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 31 - Forks: 3

hcd233/Aris-AI-Model-Server
An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API
Language: Python - Size: 1.05 MB - Last synced at: 8 days ago - Pushed at: 28 days ago - Stars: 14 - Forks: 1

seyf1elislam/LocalLLM_OneClick_Colab
Run gguf LLM models in Latest Version TextGen-webui
Language: Jupyter Notebook - Size: 102 KB - Last synced at: 2 days ago - Pushed at: 7 months ago - Stars: 10 - Forks: 0

matlok-ai/bampe-weights
This repository is for profiling, extracting, visualizing and reusing generative AI weights to hopefully build more accurate AI models and audit/scan weights at rest to identify knowledge domains for risk(s).
Language: Python - Size: 96.7 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 0

Aqirito/A.L.I.C.E
A.L.I.C.E (Artificial Labile Intelligence Cybernated Existence). A REST API of A.I companion for creating more complex system
Language: Python - Size: 22.2 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 7 - Forks: 0

tripathiarpan20/self-improvement-4all
Private self-improvement coaching with open-source LLMs
Language: Python - Size: 2.89 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

bobazooba/shurale
Conversation AI model for open domain dialogs
Language: Python - Size: 64.5 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 1

amajji/LLM-Quantization-Techniques-Absmax-Zeropoint-GPTQ-GGUF
LLM quantization techniques: absmax, zero-point, GPTQ and GGUF
Language: Jupyter Notebook - Size: 182 KB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 1

ldilov/IntelliBridge
IntelliBridge is an advanced semi-autonomous chatbot designed to empower users with the power of cutting-edge natural language processing technology. Harnessing the capabilities of large language models such as GPT and LLAMA, IntelliBridge provides an intuitive user interface and API for seamless interaction with these models.
Language: Python - Size: 16.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

STiFLeR7/Edge-LLM
Optimized Qwen2.5-3B using GPTQ, reducing size from 5.75GB → 1.93GB and improving inference speed. Ideal for efficient edge AI deployments.
Language: Python - Size: 22.5 KB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

lpalbou/model-quantizer
Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.
Language: Python - Size: 165 KB - Last synced at: 30 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

ElDokmak/LLMs-variety
Hands on some LLMs
Language: Jupyter Notebook - Size: 546 KB - Last synced at: 5 days ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

SujanNeupane42/NEPSE-Chatbot-Using-Retrieval-augmented-generation-and-reranking
This project will develop a NEPSE chatbot using an open-source LLM, incorporating sentence transformers, vector database and reranking.
Language: Jupyter Notebook - Size: 9.16 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

SujanNeupane42/LLM_Quantization
Quantizing LLMs using GPTQ
Language: Jupyter Notebook - Size: 58.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
