model-quantization | Topic | Ecosyste.ms: Repos

Topic: "model-quantization"

Efficient-ML/Awesome-Model-Quantization

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

Size: 61.5 MB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 2,084 - Forks: 221

horseee/Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Language: Python - Size: 62.3 MB - Last synced at: 4 days ago - Pushed at: 25 days ago - Stars: 1,657 - Forks: 134

datawhalechina/awesome-compression

模型压缩的小白入门教程

Size: 302 MB - Last synced at: 11 days ago - Pushed at: 6 months ago - Stars: 274 - Forks: 34

inferflow/inferflow

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

Language: C++ - Size: 1.89 MB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 243 - Forks: 25

Efficient-ML/Awesome-Efficient-AIGC

A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including language and vision, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

Size: 63.5 KB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 178 - Forks: 11

sayakpaul/Adventures-in-TensorFlow-Lite

This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.

Language: Jupyter Notebook - Size: 49.1 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 172 - Forks: 35

RodolfoFerro/psychopathology-fer-assistant

[WINNER! 🏆] Psychopathology FER Assistant. Because mental health matters. My project submission for #TFWorld TF 2.0 Challenge at Devpost.

Language: Jupyter Notebook - Size: 12 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 67 - Forks: 25

htqin/BiBench

This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binarization.

Language: Python - Size: 110 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 41 - Forks: 3

htqin/QuantSR

This project is the official implementation of our accepted NeurIPS 2023 (spotlight) paper QuantSR: Accurate Low-bit Quantization for Efficient Image Super-Resolution.

Language: Python - Size: 9.75 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 31 - Forks: 2

seonglae/llama2gptq

Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.

Language: Python - Size: 9.48 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 29 - Forks: 0

nbasyl/OFQ

The official implementation of the ICML 2023 paper OFQ-ViT

Language: Python - Size: 640 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 0

dcarpintero/ai-engineering

AI Engineering: Annotated NBs to dive into Self-Attention, In-Context Learning, RAG, Knowledge-Graphs, Fine-Tuning, Model Optimization, and many more.

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 0

NANEXLABS/Nanex-AI

Enterprise multi-agent framework for secure, borderless data collaboration with zero-trust and federated learning-lightweight edge-ready.

Language: Python - Size: 119 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

frickyinn/BiDense

PyTorch implementation of "BiDense: Binarization for Dense Prediction," A binary neural network for dense prediction tasks.

Language: Python - Size: 1.21 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

SRDdev/Model-Quantization

Quantization is a technique to reduce the computational and memory costs of running inference by representing the weights and activations with low-precision data types like 8-bit integer (int8) instead of the usual 32-bit floating point (float32).

Language: Jupyter Notebook - Size: 3.16 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

dwain-barnes/LLM-GGUF-Auto-Converter

Automated Jupyter notebook solution for batch converting Large Language Models to GGUF format with multiple quantization options. Built on llama.cpp with HuggingFace integration.

Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 2

nnilayy/Spresense

Language: C++ - Size: 2.59 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

Keshavpatel2/local-llm-workbench

🧠 A comprehensive toolkit for benchmarking, optimizing, and deploying local Large Language Models. Includes performance testing tools, optimized configurations for CPU/GPU/hybrid setups, and detailed guides to maximize LLM performance on your hardware.

Language: Shell - Size: 8.79 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos