An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: gpu-optimization

Kuenoz/pytorch_training_optimization_using_tensordict_memory_mapping

Optimizing PyTorch Model Training by Wrapping Memory Mapped Tensors on an Nvidia GPU with TensorDict.

Language: Python - Size: 11.8 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

Dongskie43/nlp-engineering-hub

📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤

Size: 1.95 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

VoidYogendra/Face-Point

First open-source real-time face filter app using MediaPipe FaceMesh for high-performance, GPU-accelerated effects.

Language: Kotlin - Size: 66.8 MB - Last synced at: 14 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

JonSnow1807/Fused-LayerNorm-CUDA-Operator

High-performance CUDA implementation of LayerNorm for PyTorch achieving 1.46x speedup through kernel fusion. Optimized for large language models (4K-8K hidden dims) with vectorized memory access, warp-level primitives, and mixed precision support. Drop-in replacement for nn.LayerNorm with 25% memory reduction.

Language: Python - Size: 391 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

ankitrajsh/foveated-rendering-with-ai-based-eye-tracking

Dynamically reduce GPU rendering load by focusing high-res rendering where the user is looking. Enhances performance on mobile/AR devices.

Size: 3.91 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

RobThePCGuy/Performance-Mod-Guide-For-Valheim

Boost Valheim's FPS to forge a smoother Viking journey!

Language: PowerShell - Size: 14.5 MB - Last synced at: 4 days ago - Pushed at: 10 months ago - Stars: 27 - Forks: 0

gregorgatej/notebooks

Small collection of Jupyter Notebooks, covering different topics I find interesting.

Language: Jupyter Notebook - Size: 83.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Md-Emon-Hasan/Fine-Tuning

End-to-end fine-tuning of Hugging Face models using LoRA, QLoRA, quantization, and PEFT techniques. Optimized for low-memory with efficient model deployment

Language: Jupyter Notebook - Size: 5.53 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

OriYarden/pytorch_training_optimization_using_tensordict_memory_mapping

Optimizing PyTorch Model Training by Wrapping Memory Mapped Tensors on Nvidia GPUs with TensorDict.

Language: Python - Size: 11.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

0xf0011/cryptic-simglyph-allocator

Size: 0 Bytes - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Berto70/nbody_cuda

Parallel N-Body algorithm with CUDA. Modern Computing for Physics - 2025 - UniPD

Language: Jupyter Notebook - Size: 17.5 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

icelook349/Nvidia-Driver-Tweaker-No-Crack

This repository provides a tool for tweaking and optimizing Nvidia graphics card drivers for better performance, stability, and custom configurations. It allows users to adjust various settings for optimal GPU performance and better gaming or rendering experience.

Size: 0 Bytes - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

fatalik2319/Nvidia-Driver-Tweaker-No-Crack

This repository provides a tool for tweaking and optimizing Nvidia graphics card drivers for better performance, stability, and custom configurations. It allows users to adjust various settings for optimal GPU performance and better gaming or rendering experience.

Size: 6.84 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

broomelasticheart/Claymore-Dual-Miner-Multi-Crypto-Mining

Claymore Dual Miner allows simultaneous mining of multiple cryptocurrencies, optimizing your mining profits while efficiently using GPU resources. ⛏️💰

Size: 7.81 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

BjornMelin/nlp-engineering-hub

📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤

Size: 5.86 KB - Last synced at: 6 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 1

BjornMelin/edge-ai-engineering

📱 Optimized ML for edge devices. Showcasing efficient model deployment, GPU-CPU memory transfer optimization, and real-world edge AI applications. 🤖

Size: 5.86 KB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

AKKI0511/Traffic-Sign-Recognition

Traffic sign recognition using deep learning. Implemented and compared custom CNN and transfer learning models (ResNet50, MobileNetV2) with comprehensive evaluation metrics. Achieved 98.8% accuracy with a focus on real-world efficiency.

Language: Jupyter Notebook - Size: 171 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

raj200501/GPUOptimizerML

The GPU Optimizer for ML Models enhances GPU performance for machine learning. It offers advanced scheduling, real-time monitoring, and efficient resource management through a user-friendly web interface and robust API, integrating big data technologies for seamless data processing and model optimization.

Language: Python - Size: 56.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

GVProf/GVProf

GVProf: A Value Profiler for GPU-based Clusters

Language: Python - Size: 229 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 43 - Forks: 9

yui0/waifu2x-glsl

Fast waifu2x converter with GPU optimization

Language: C - Size: 40.4 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 26 - Forks: 8

yui0/waifu2x-ocl

Fast waifu2x converter with GPU optimization

Language: C - Size: 2.98 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 23 - Forks: 3

Related Keywords
gpu-optimization 21 cuda 6 python 6 machine-learning 5 pytorch 4 nlp 3 transformers 3 huggingface 3 nvidia 3 memory-mapping 3 gpu 3 deep-learning 3 artificial-intelligence 2 natural-language-processing 2 system-tuning 2 system-performance 2 performance-tuning 2 fast-waifu2x-converter 2 pc-performance 2 nvidia-tools 2 nvidia-driver 2 hardware-optimization 2 graphics-tuning 2 graphics-card 2 gpu-tuning 2 computer-optimization 2 driver-customization 2 driver-enhancement 2 driver-optimization 2 driver-tweaker 2 driver-update 2 gaming-tools 2 gpu-performance 2 macos 2 nyanko 2 waifu2x 2 resolution 2 openai 2 large-language-models 2 language-models 2 langchain 2 huggingface-transformers 2 ai 2 memory-mapped-tensors 2 optimization 2 linux 2 pytorch-tensors 2 pytorch-training 2 pytorch-training-optimization 2 tensordict 2 tensors 2 torch 2 edge-computing 1 profitability 1 multi-crypto-mining 1 mining-tools 1 mining-performance 1 mining-algorithms 1 gpu-mining 1 ethereum-mining 1 eth-mining 1 dual-mining 1 dual-cryptocurrency 1 digital-currency 1 cryptocurrency-miner 1 cryptocurrency-investing 1 cryptocurrency 1 crypto-mining 1 claymore-dual-miner 1 waifu2x-glsl 1 opencl 1 waifu2x-ocl 1 windows 1 traffic-sign-recognition 1 transfer-learning 1 big-data-integration 1 gpu-scheduling 1 model-management 1 real-time-monitoring 1 secure-api 1 binary-analysis 1 clusters 1 data-flow 1 instrumentation 1 patterns 1 profiler 1 redundancy 1 value-profiler 1 gpgpu 1 glew 1 glsl 1 embedded-systems 1 iot 1 mobile-ml 1 model-optimization 1 tflite 1 cnn-architecture 1 computer-vision 1 convolutional-neural-networks-cnn 1 data-augmentation 1