GitHub topics: gpu-programming
Equiel-1703/ocl-polyhok
A PolyHok implementation based on OpenCL for GPU programming using Elixir.
Language: Elixir - Size: 636 KB - Last synced at: about 22 hours ago - Pushed at: about 23 hours ago - Stars: 0 - Forks: 0
goabiaryan/awesome-gpu-engineering
GPU Engineering for AI Systems
Language: HTML - Size: 900 KB - Last synced at: about 23 hours ago - Pushed at: 27 days ago - Stars: 84 - Forks: 10
razord21/Canny-Edge-Detector
πΌοΈ Implement high-performance Canny edge detection using CPU and CUDA, enabling efficient image processing with benchmarking capabilities.
Language: C - Size: 1.38 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0
Gaius-del/python_hpc_2025
π Accelerate scientific applications in supercomputing with Python using Numba and Dask for efficient parallel and distributed computing.
Language: Jupyter Notebook - Size: 1.32 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0
Atheeth24091998/Deep-learning-wound-segmentation
Reproduction and extension of WSNet: a state-of-the-art deep learning model for wound image segmentation. Combines global (whole image) and local (patch-based) context to deliver precise detection of wound boundaries from clinical images, following the latest research from WACV 2023. Includes robust experimentation with multiple model architecture
Language: Python - Size: 201 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0
nabla-ml/nabla
Machine Learning library for the emerging Mojo/Python ecosystem
Language: Python - Size: 60 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 295 - Forks: 10
DannyDoesGraphics/DARE
Danny's Awesome Rendering Engine
Language: Rust - Size: 4.52 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0
Misteri4452y/taskflow
Smart weekly planner with auto-scheduling and Google Calendar integration
Language: Python - Size: 31.3 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0
NVIDIA/cccl
CUDA Core Compute Libraries
Language: C++ - Size: 340 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2,035 - Forks: 294
exaloop/codon
A high-performance, zero-overhead, extensible Python compiler with built-in NumPy support
Language: Python - Size: 7.55 MB - Last synced at: 3 days ago - Pushed at: 10 days ago - Stars: 16,151 - Forks: 568
nwmarino/gcl
gpu-compute library
Language: C++ - Size: 155 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0
Rust-GPU/rust-cuda
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
Language: Rust - Size: 6.11 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4,851 - Forks: 214
software-mansion/TypeGPU
A modular and open-ended toolkit for WebGPU, with advanced type inference and the ability to write shaders in TypeScript
Language: TypeScript - Size: 261 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,708 - Forks: 36
LLNL/CARE
CHAI and RAJA provide an excellent base on which to build portable codes. CARE expands that functionality, adding new features such as loop fusion capability and a portable interface for many numerical algorithms. It provides all the basics for anyone wanting to write portable code.
Language: C++ - Size: 1.51 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 31 - Forks: 5
DiamondLightSource/fast-feedback-service
GPU based service to provide fast-feedback results
Language: C++ - Size: 1000 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3 - Forks: 3
JuliaGPU/AMDGPU.jl
AMD GPU (ROCm) programming in Julia
Language: Julia - Size: 13.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 318 - Forks: 60
aryagxr/cuda
coding CUDA everyday!
Language: Cuda - Size: 143 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 71 - Forks: 3
junjason/dynsoa-adaptive-runtime
Adaptive Structure-of-Arrays Runtime for GPU/CPU Parallel Simulation β with dynamic layout migration, divergence sensing, and AoSoA/matrix transformation. Patent-backed.
Language: C++ - Size: 43 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0
taskflow/taskflow
A General-purpose Task-parallel Programming System using Modern C++
Language: C++ - Size: 142 MB - Last synced at: 7 days ago - Pushed at: 24 days ago - Stars: 11,387 - Forks: 1,332
FlosMume/cpp-cuda-starter
CUDA C/C++ starter template for Windows 11 + WSL2 (RTX 4070 SUPER tested)
Language: Shell - Size: 3.34 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0
maltsev-andrey/gpu-nbody-simulation
High-performance N-body physics simulation leveraging CUDA parallel computing. Implements O(NΒ²) direct summation with 1.6B interactions/sec throughput. Comprehensive benchmarks demonstrate 13,050Γ speedup vs CPU baseline on Tesla P100 GPU.
Language: Python - Size: 4.84 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0
dino65-dev/Cuda_ML_Library
This is a Cuda applied ML Library so that anyone can use GPU Powered ML with Ease in Python.
Language: Cuda - Size: 143 KB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0
mikeroyal/GPU-Guide
Graphics Processing Unit (GPU) Architecture Guide
Language: Shell - Size: 815 KB - Last synced at: 4 days ago - Pushed at: almost 4 years ago - Stars: 248 - Forks: 20
arminkz/SolarSystem
Solar system visualization using my own graphics engine in Vulkan
Language: C++ - Size: 180 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0
Rust-GPU/rust-gpu
π Making Rust a first-class language and ecosystem for GPU shaders π§
Language: Rust - Size: 397 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 2,529 - Forks: 81
calebwin/emu
The write-once-run-anywhere GPGPU library for Rust
Language: Rust - Size: 342 MB - Last synced at: 10 days ago - Pushed at: almost 3 years ago - Stars: 1,609 - Forks: 52
plasma-umass/scalene
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Language: Python - Size: 15.3 MB - Last synced at: 13 days ago - Pushed at: 17 days ago - Stars: 13,081 - Forks: 429
ProjectPhysX/OpenCL-Wrapper
OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cumbersome and the code overhead prevents many people from getting started. I created this lightweight OpenCL-Wrapper to greatly simplify OpenCL software development with C++ while keeping functionality and performance.
Language: C++ - Size: 405 KB - Last synced at: 13 days ago - Pushed at: 14 days ago - Stars: 442 - Forks: 43
jaredhoberock/ubu
Language: C++ - Size: 1.97 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 3 - Forks: 0
Oabraham1/chronos
Chronos is a time-based GPU partitioning utility that allows multiple users or applications to share a single GPU by creating exclusive time-limited partitions with automatic expiration. Built with OpenCL, it works across platforms including macOS (Apple Silicon & Intel), Linux, and Windows.
Language: C++ - Size: 89.8 KB - Last synced at: about 14 hours ago - Pushed at: about 1 month ago - Stars: 24 - Forks: 2
romansource/shader-job
π GPU computations in C# lambdas
Language: C# - Size: 4.55 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0
farukalamai/100-days-of-cuda
100 days of writing CUDA kernels!
Language: Makefile - Size: 389 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0
CruzCortes/prismatic-flare
Metal shader for rendering dynamic spectral ray effects behind macOS desktop windows. Features smooth chromatic gradient transitions using double smoothstep interpolation. Integrates with private WindowServer APIs for below-window-layer compositing.
Language: Swift - Size: 507 KB - Last synced at: 19 days ago - Pushed at: about 1 month ago - Stars: 9 - Forks: 0
maltsev-andrey/julia_set_cuda
High-performance Julia set fractal computation in pure CUDA C, achieving 2.78 billion pixels/second on Tesla P100. Demonstrates GPU kernel programming, memory optimization, and massive parallelization (16M+ threads)."
Language: Cuda - Size: 1.3 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0
ProgrammerGnome/CUDA-codes
Snippet repository for learning parallel GPU programming with CUDA.
Language: C++ - Size: 4.88 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0
adamnemecek/awesome-metal
A collection of Metal and MetalKit projects and resources. Very much work in progress.
Size: 21.5 KB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 220 - Forks: 19
ivantag13/dist-GPU-accelerated-tree-search Fork of Guillaume-Helbecque/GPU-accelerated-tree-search-Chapel
Distributed GPU-accelerated tree search: Investigating a B&B algorithm based on a MPI+X (X=OpenMP, MPI, CUDA, HIP, etc) implementation
Language: C - Size: 664 KB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 1 - Forks: 0
Alan-Rock-GS/GpuScript
GpuScript allows you to write C# programs that run at supercomputer speeds on a single GPU. Learn it in 30 minutes. Write & debug large and complex projects specifically designed to run on the GPU.
Size: 424 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 199 - Forks: 20
cybersecurity-dev/awesome-gpu-programming
Awesome GPU Programming
Size: 12.7 KB - Last synced at: 4 days ago - Pushed at: 28 days ago - Stars: 1 - Forks: 0
EmbarkStudios/rust-gpu
π Making Rust a first-class language and ecosystem for GPU shaders π§
Language: Rust - Size: 248 MB - Last synced at: 28 days ago - Pushed at: about 1 year ago - Stars: 7,571 - Forks: 247
fabiocalabrese/HPC_Assignment Fork of Merlino2706/HPC_Assignment
Assignment for the HPC course 2025
Language: C - Size: 1020 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0
AmesingFlank/taichi.js
Modern GPU Compute and Rendering in Javascript
Language: TypeScript - Size: 220 MB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 515 - Forks: 20
lucascogrossi/triton
Repository for learning Triton GPU programming
Language: Python - Size: 27.3 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
simar-rekhi/triton
LLM-assisted compiler pass generation with Triton & CUDA
Language: Jupyter Notebook - Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0
NVIDIA/optix-dev
OptiX SDK headers, everything needed to build & run OptiX applications. SDK samples not included.
Language: C++ - Size: 186 KB - Last synced at: 9 days ago - Pushed at: 9 months ago - Stars: 35 - Forks: 2
YaccConstructor/Brahma.FSharp Fork of gsvgit/Brahma.FSharp
F# quotation to OpenCL translator and respective runtime to utilize GPGPUs in F# applications.
Language: F# - Size: 52.1 MB - Last synced at: 12 days ago - Pushed at: 8 months ago - Stars: 77 - Forks: 16
Mrezadwiprasetiawan/cpp-playground
A collection of C++ experiments and code created as part of exploration and practice
Language: C++ - Size: 21.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 1
shreyansh26/MLSys-Experiments
A collection of scripts on experimenting and implementing MLSys-related stuff
Language: Jupyter Notebook - Size: 83.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0
vista-art/fragmentcolor
π¦ Easy GPU programming for Javascript, Python, Swift, and Kotlin.
Language: Rust - Size: 63.2 MB - Last synced at: about 16 hours ago - Pushed at: 23 days ago - Stars: 6 - Forks: 0
MetaMachines/mm-ptx-py
PTX Inject and Stack PTX for Python
Language: C - Size: 13.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
steaklive/EveryRay-Rendering-Engine
Robust real-time rendering engine on DX11, DX12 with many advanced graphical features for quick prototyping
Language: C++ - Size: 3.46 GB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 713 - Forks: 31
kevinyangjx/AkuaEngine
A real-time fluid simulation engine implemented in C++, with CUDA and OpenGL.
Language: C++ - Size: 24.7 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0
andresnowak/PMPP-solutions
Solutions to the chapters of the Programming massively parallel processors 3rd and 4th edition edition book. (Some answers may be incorrect)
Language: Cuda - Size: 410 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
Mantissagithub/edge_detection_gpu
GPU-accelerated Canny edge detector in CUDA C++. Parallelizes Gaussian filtering, gradient computation, non-maximum suppression, and hysteresis thresholding for real-time edge detection performance
Language: Cuda - Size: 4.49 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0
nbathreya/CUDA-Signal-Processor
GPU-Accelerated Signal Processing
Language: Python - Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0
abhiyanpaudel/parallel-highlife
High-performance CUDA, MPI, and Hybrid implementations demonstrating GPU computing and parallel programming.
Language: C - Size: 438 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
GpapPeaky/Basic-OpenGL
Basic OpenGL implementation for triangles, quads and textured quads
Language: C++ - Size: 42.4 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
Young-TW/hippp
Write GPU program with RAII
Language: C++ - Size: 85.9 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0
bfGraph/STGraph
π Vertex Centric approach for building GNN/TGNNs
Language: Python - Size: 13.7 MB - Last synced at: 27 days ago - Pushed at: about 1 year ago - Stars: 23 - Forks: 0
lucidrains/triton-transformer
Implementation of a Transformer, but completely in Triton
Language: Python - Size: 34.3 MB - Last synced at: 27 days ago - Pushed at: over 3 years ago - Stars: 276 - Forks: 16
mikeroyal/Vulkan-Guide
Vulkan Guide
Language: C++ - Size: 43 KB - Last synced at: 7 days ago - Pushed at: almost 4 years ago - Stars: 30 - Forks: 2
taichi-dev/taichi
Productive, portable, and performant GPU programming in Python.
Language: C++ - Size: 57.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 27,581 - Forks: 2,363
AIComputing101/reinforcement-learning-101
An opinionated, endβtoβend tutorial project for learning Reinforcement Learning (RL) from first principles to deployment. No notebooks. Everything is an explicit, inspectable Python script you can diff, profile, containerize, and ship.
Language: Python - Size: 222 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0
abeleinin/Metal-Puzzles
Solve Puzzles. Learn Metal π€
Language: Jupyter Notebook - Size: 3.84 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 587 - Forks: 28
ParaGroup/WindFlow
A C++17 Data Stream Processing Parallel Library for Multicores and GPUs
Language: C++ - Size: 48.9 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 84 - Forks: 19
sudoDeVinci/skyDeVisionImager
Advanced environmental monitoring platform combining computer vision and geospatial analysis. Low-compute cloud detection, 3D terrain visualization from GeoTIFF data, multi-camera calibration, and statistical validation. scalable architecture with Flask web interface and SQLite backend.
Language: Python - Size: 20.8 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0
LiteObject/CUDA-Image-Processing-App
Real-time GPU-accelerated image processing application using CUDA and Python. Features 11 visual filters including edge detection, blur, sepia, cartoon effects, and more - all running at 30 FPS with live webcam input.
Language: Python - Size: 62.5 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0
AIComputing101/gpu-programming-101
A comprehensive hands-on project for learning GPU programming with CUDA and HIP, covering fundamental concepts through advanced optimization techniques.
Language: C++ - Size: 877 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 31 - Forks: 3
geomstats/geomstats
Computations and statistics on manifolds with geometric structures.
Language: Python - Size: 225 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1,403 - Forks: 268
ProjectPhysX/PTXprofiler
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
Language: C++ - Size: 11.7 KB - Last synced at: 27 days ago - Pushed at: 8 months ago - Stars: 56 - Forks: 6
wmmae/wmma_extension
An extension library of WMMA API (Tensor Core API)
Language: Cuda - Size: 698 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 106 - Forks: 16
coderonion/cuda-beginner-course-rust-version
bilibiliθ§ι’γCUDA 12.x εΉΆθ‘ηΌη¨ε ₯ι¨(Rustη)γι ε₯代η
Language: Rust - Size: 10.7 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0
adriengivry/orhi
Cross-Platform Interface for Modern Graphics APIs (Vulkan, DirectX 12, Metal)
Language: C++ - Size: 1.5 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 75 - Forks: 3
hollance/metal-gpgpu
Collection of notes on how to use Appleβs Metal API for compute tasks
Size: 1000 Bytes - Last synced at: 27 days ago - Pushed at: over 7 years ago - Stars: 107 - Forks: 4
Herdora/kandc
The profiler that gives a unified view of your entire stack - from PyTorch down to GPU
Language: Python - Size: 22.5 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 88 - Forks: 9
Mgepahmge/CuWeaver
A CUDA concurrency library designed to simplify concurrency programming, offering C++-style wrappers for selected CUDA Runtime APIs
Language: Cuda - Size: 1.48 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 6 - Forks: 0
Awrsha/Advanced-CUDA-Programming-GPU-Architecture
This repository provides a comprehensive guide to optimizing GPU kernels for performance, with a focus on NVIDIA GPUs. It covers key tools and techniques such as CUDA, PyTorch, and Triton, aimed at improving computational efficiency for deep learning and scientific computing tasks.
Language: Cuda - Size: 25.2 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0
xframes-project/xframes
GPU-accelerated GUI development for the desktop and the browser
Language: TypeScript - Size: 28.4 MB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 15 - Forks: 0
fastflow/fastflow
FastFlow pattern-based parallel programming framework (formerly on sourceforge)
Language: C++ - Size: 178 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 296 - Forks: 72
AmanSwar/KernelLab
collection of high-performance CUDA implementations, ranging from naive to highly optimized versions.
Language: Cuda - Size: 6.68 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0
tgautam03/xFilters
GPU (CUDA) accelerated filters using 2D convolution for high resolution images.
Language: C++ - Size: 58.2 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 8 - Forks: 1
JuliaGPU/CuArrays.jl π¦
A Curious Cumulation of CUDA Cuisine
Language: Julia - Size: 2.16 MB - Last synced at: 28 days ago - Pushed at: over 5 years ago - Stars: 277 - Forks: 78
raghulrajn/UNET-on-GPU-using-OpenCL
Inference engine for UNET written in C++ for CPU and GPU
Language: C++ - Size: 29.2 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0
StokastX/Nexus
An interactive GPU path tracer from scratch written in C++ using CUDA and OpenGL
Language: C++ - Size: 328 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 24 - Forks: 0
gpufit/Gpufit
GPU-accelerated Levenberg-Marquardt curve fitting in CUDA
Language: Cuda - Size: 1.14 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 332 - Forks: 99
i-Taylo/iUnlockerGL
iUnlocker GLTool is a Magisk module designed to spoof GPU information, allowing users to modify GPU informations for unlocking graphics in games and testing.
Language: Shell - Size: 145 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 33 - Forks: 0
JonSnow1807/FastMQA
CUDA implementation of Multi-Query Attention achieving 97% KV-cache memory reduction for LLM inference, enabling 32x larger batch sizes. Educational project demonstrating CUDA kernel development with PyTorch integration and Llama model benchmarks.
Language: Python - Size: 587 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0
CRUXIV/FPSBOOSTER
This is a FPS booster that limits the background processes of windows. It makes the GPU more stable and optimized. FOR AMD AND NIVIDIA. It is meant for GAMING AND GENERAL USE AT A SMALL FILE SIZE!
Size: 3.06 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0
andrewmilson/ministark
πββοΈπ¨ GPU accelerated STARK prover built on @arkworks-rs
Language: Rust - Size: 1.65 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 365 - Forks: 36
YichengDWu/MoYe.jl
Programming Gemm Kernels on NVIDIA GPUs with Tensor Cores in Julia
Language: Julia - Size: 7.4 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 42 - Forks: 0
DmitryYurov/bitonic-cuda
An implementation of bitonic search on CUDA
Language: Cuda - Size: 39.1 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0
LLAA178/LeetGPU-Guidebook
δΈζ₯ζ₯ιε ³GPUηΌη¨
Size: 76.2 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0
pnikitakis/high-performance-computing
5 problem sets of parallel programming on CPU and GPU. University projects for High Performance Computing Systems (Fall 2016).
Language: Cuda - Size: 1.06 MB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 0
NLeSC-COMPAS/kmm
KMM: parallel dataflow scheduler and efficient memory management for multi-GPU platforms
Language: C++ - Size: 8.34 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 1
jeffasante/metal-raymarch-rs
A basic 3D raymarcher built with Rust and Apple's Metal API. A learning project exploring SDF rendering.
Language: Rust - Size: 1020 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0
elymsyr/auv_control_model
This repository implements an imitation learning pipeline for AUV control. It uses the "FossenNet" neural network to mimic an optimal NL-MPC policy and includes tools for data generation, training, and real-time C++ inference on GPUs.
Language: Jupyter Notebook - Size: 43.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0
MeylandMan/Mabble
A cross-platform GPU backend library
Language: C++ - Size: 966 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0
eomii/rules_ll
An Upstream Clang/LLVM-based toolchain for contemporary C++ and heterogeneous programming
Language: Starlark - Size: 3.96 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 93 - Forks: 10
benc-uk/webgl-sandbox
Interactive editor & sandbox for creating & running WebGL2 shaders
Language: JavaScript - Size: 4.71 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0
sagartv/cudalinreg_source
A GPU-Parallelised univariate Linear Regression Library ( N > 100k) written using CUDA C++ Kernels that can be installed as a Python Package.
Language: Python - Size: 26.4 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0