GitHub topics: gpu-optimization
Kuenoz/pytorch_training_optimization_using_tensordict_memory_mapping
Optimizing PyTorch Model Training by Wrapping Memory Mapped Tensors on an Nvidia GPU with TensorDict.
Language: Python - Size: 11.8 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

Dongskie43/nlp-engineering-hub
📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤
Size: 1.95 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

VoidYogendra/Face-Point
First open-source real-time face filter app using MediaPipe FaceMesh for high-performance, GPU-accelerated effects.
Language: Kotlin - Size: 66.8 MB - Last synced at: 14 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

JonSnow1807/Fused-LayerNorm-CUDA-Operator
High-performance CUDA implementation of LayerNorm for PyTorch achieving 1.46x speedup through kernel fusion. Optimized for large language models (4K-8K hidden dims) with vectorized memory access, warp-level primitives, and mixed precision support. Drop-in replacement for nn.LayerNorm with 25% memory reduction.
Language: Python - Size: 391 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

ankitrajsh/foveated-rendering-with-ai-based-eye-tracking
Dynamically reduce GPU rendering load by focusing high-res rendering where the user is looking. Enhances performance on mobile/AR devices.
Size: 3.91 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

RobThePCGuy/Performance-Mod-Guide-For-Valheim
Boost Valheim's FPS to forge a smoother Viking journey!
Language: PowerShell - Size: 14.5 MB - Last synced at: 4 days ago - Pushed at: 10 months ago - Stars: 27 - Forks: 0

gregorgatej/notebooks
Small collection of Jupyter Notebooks, covering different topics I find interesting.
Language: Jupyter Notebook - Size: 83.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Md-Emon-Hasan/Fine-Tuning
End-to-end fine-tuning of Hugging Face models using LoRA, QLoRA, quantization, and PEFT techniques. Optimized for low-memory with efficient model deployment
Language: Jupyter Notebook - Size: 5.53 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

OriYarden/pytorch_training_optimization_using_tensordict_memory_mapping
Optimizing PyTorch Model Training by Wrapping Memory Mapped Tensors on Nvidia GPUs with TensorDict.
Language: Python - Size: 11.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

0xf0011/cryptic-simglyph-allocator
Size: 0 Bytes - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Berto70/nbody_cuda
Parallel N-Body algorithm with CUDA. Modern Computing for Physics - 2025 - UniPD
Language: Jupyter Notebook - Size: 17.5 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

icelook349/Nvidia-Driver-Tweaker-No-Crack
This repository provides a tool for tweaking and optimizing Nvidia graphics card drivers for better performance, stability, and custom configurations. It allows users to adjust various settings for optimal GPU performance and better gaming or rendering experience.
Size: 0 Bytes - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

fatalik2319/Nvidia-Driver-Tweaker-No-Crack
This repository provides a tool for tweaking and optimizing Nvidia graphics card drivers for better performance, stability, and custom configurations. It allows users to adjust various settings for optimal GPU performance and better gaming or rendering experience.
Size: 6.84 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

broomelasticheart/Claymore-Dual-Miner-Multi-Crypto-Mining
Claymore Dual Miner allows simultaneous mining of multiple cryptocurrencies, optimizing your mining profits while efficiently using GPU resources. ⛏️💰
Size: 7.81 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

BjornMelin/nlp-engineering-hub
📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤
Size: 5.86 KB - Last synced at: 6 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 1

BjornMelin/edge-ai-engineering
📱 Optimized ML for edge devices. Showcasing efficient model deployment, GPU-CPU memory transfer optimization, and real-world edge AI applications. 🤖
Size: 5.86 KB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

AKKI0511/Traffic-Sign-Recognition
Traffic sign recognition using deep learning. Implemented and compared custom CNN and transfer learning models (ResNet50, MobileNetV2) with comprehensive evaluation metrics. Achieved 98.8% accuracy with a focus on real-world efficiency.
Language: Jupyter Notebook - Size: 171 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

raj200501/GPUOptimizerML
The GPU Optimizer for ML Models enhances GPU performance for machine learning. It offers advanced scheduling, real-time monitoring, and efficient resource management through a user-friendly web interface and robust API, integrating big data technologies for seamless data processing and model optimization.
Language: Python - Size: 56.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

GVProf/GVProf
GVProf: A Value Profiler for GPU-based Clusters
Language: Python - Size: 229 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 43 - Forks: 9

yui0/waifu2x-glsl
Fast waifu2x converter with GPU optimization
Language: C - Size: 40.4 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 26 - Forks: 8

yui0/waifu2x-ocl
Fast waifu2x converter with GPU optimization
Language: C - Size: 2.98 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 23 - Forks: 3
