An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: gpu-programming

Equiel-1703/ocl-polyhok

A PolyHok implementation based on OpenCL for GPU programming using Elixir.

Language: Elixir - Size: 636 KB - Last synced at: about 22 hours ago - Pushed at: about 23 hours ago - Stars: 0 - Forks: 0

goabiaryan/awesome-gpu-engineering

GPU Engineering for AI Systems

Language: HTML - Size: 900 KB - Last synced at: about 23 hours ago - Pushed at: 27 days ago - Stars: 84 - Forks: 10

razord21/Canny-Edge-Detector

πŸ–ΌοΈ Implement high-performance Canny edge detection using CPU and CUDA, enabling efficient image processing with benchmarking capabilities.

Language: C - Size: 1.38 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

Gaius-del/python_hpc_2025

πŸš€ Accelerate scientific applications in supercomputing with Python using Numba and Dask for efficient parallel and distributed computing.

Language: Jupyter Notebook - Size: 1.32 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

Atheeth24091998/Deep-learning-wound-segmentation

Reproduction and extension of WSNet: a state-of-the-art deep learning model for wound image segmentation. Combines global (whole image) and local (patch-based) context to deliver precise detection of wound boundaries from clinical images, following the latest research from WACV 2023. Includes robust experimentation with multiple model architecture

Language: Python - Size: 201 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

nabla-ml/nabla

Machine Learning library for the emerging Mojo/Python ecosystem

Language: Python - Size: 60 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 295 - Forks: 10

DannyDoesGraphics/DARE

Danny's Awesome Rendering Engine

Language: Rust - Size: 4.52 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

Misteri4452y/taskflow

Smart weekly planner with auto-scheduling and Google Calendar integration

Language: Python - Size: 31.3 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

NVIDIA/cccl

CUDA Core Compute Libraries

Language: C++ - Size: 340 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2,035 - Forks: 294

exaloop/codon

A high-performance, zero-overhead, extensible Python compiler with built-in NumPy support

Language: Python - Size: 7.55 MB - Last synced at: 3 days ago - Pushed at: 10 days ago - Stars: 16,151 - Forks: 568

nwmarino/gcl

gpu-compute library

Language: C++ - Size: 155 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

Rust-GPU/rust-cuda

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.

Language: Rust - Size: 6.11 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4,851 - Forks: 214

software-mansion/TypeGPU

A modular and open-ended toolkit for WebGPU, with advanced type inference and the ability to write shaders in TypeScript

Language: TypeScript - Size: 261 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,708 - Forks: 36

LLNL/CARE

CHAI and RAJA provide an excellent base on which to build portable codes. CARE expands that functionality, adding new features such as loop fusion capability and a portable interface for many numerical algorithms. It provides all the basics for anyone wanting to write portable code.

Language: C++ - Size: 1.51 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 31 - Forks: 5

DiamondLightSource/fast-feedback-service

GPU based service to provide fast-feedback results

Language: C++ - Size: 1000 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3 - Forks: 3

JuliaGPU/AMDGPU.jl

AMD GPU (ROCm) programming in Julia

Language: Julia - Size: 13.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 318 - Forks: 60

aryagxr/cuda

coding CUDA everyday!

Language: Cuda - Size: 143 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 71 - Forks: 3

junjason/dynsoa-adaptive-runtime

Adaptive Structure-of-Arrays Runtime for GPU/CPU Parallel Simulation β€” with dynamic layout migration, divergence sensing, and AoSoA/matrix transformation. Patent-backed.

Language: C++ - Size: 43 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

taskflow/taskflow

A General-purpose Task-parallel Programming System using Modern C++

Language: C++ - Size: 142 MB - Last synced at: 7 days ago - Pushed at: 24 days ago - Stars: 11,387 - Forks: 1,332

FlosMume/cpp-cuda-starter

CUDA C/C++ starter template for Windows 11 + WSL2 (RTX 4070 SUPER tested)

Language: Shell - Size: 3.34 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

maltsev-andrey/gpu-nbody-simulation

High-performance N-body physics simulation leveraging CUDA parallel computing. Implements O(NΒ²) direct summation with 1.6B interactions/sec throughput. Comprehensive benchmarks demonstrate 13,050Γ— speedup vs CPU baseline on Tesla P100 GPU.

Language: Python - Size: 4.84 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

dino65-dev/Cuda_ML_Library

This is a Cuda applied ML Library so that anyone can use GPU Powered ML with Ease in Python.

Language: Cuda - Size: 143 KB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

mikeroyal/GPU-Guide

Graphics Processing Unit (GPU) Architecture Guide

Language: Shell - Size: 815 KB - Last synced at: 4 days ago - Pushed at: almost 4 years ago - Stars: 248 - Forks: 20

arminkz/SolarSystem

Solar system visualization using my own graphics engine in Vulkan

Language: C++ - Size: 180 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

Rust-GPU/rust-gpu

πŸ‰ Making Rust a first-class language and ecosystem for GPU shaders 🚧

Language: Rust - Size: 397 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 2,529 - Forks: 81

calebwin/emu

The write-once-run-anywhere GPGPU library for Rust

Language: Rust - Size: 342 MB - Last synced at: 10 days ago - Pushed at: almost 3 years ago - Stars: 1,609 - Forks: 52

plasma-umass/scalene

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals

Language: Python - Size: 15.3 MB - Last synced at: 13 days ago - Pushed at: 17 days ago - Stars: 13,081 - Forks: 429

ProjectPhysX/OpenCL-Wrapper

OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cumbersome and the code overhead prevents many people from getting started. I created this lightweight OpenCL-Wrapper to greatly simplify OpenCL software development with C++ while keeping functionality and performance.

Language: C++ - Size: 405 KB - Last synced at: 13 days ago - Pushed at: 14 days ago - Stars: 442 - Forks: 43

jaredhoberock/ubu

Language: C++ - Size: 1.97 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 3 - Forks: 0

Oabraham1/chronos

Chronos is a time-based GPU partitioning utility that allows multiple users or applications to share a single GPU by creating exclusive time-limited partitions with automatic expiration. Built with OpenCL, it works across platforms including macOS (Apple Silicon & Intel), Linux, and Windows.

Language: C++ - Size: 89.8 KB - Last synced at: about 14 hours ago - Pushed at: about 1 month ago - Stars: 24 - Forks: 2

romansource/shader-job

πŸš€ GPU computations in C# lambdas

Language: C# - Size: 4.55 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

farukalamai/100-days-of-cuda

100 days of writing CUDA kernels!

Language: Makefile - Size: 389 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

CruzCortes/prismatic-flare

Metal shader for rendering dynamic spectral ray effects behind macOS desktop windows. Features smooth chromatic gradient transitions using double smoothstep interpolation. Integrates with private WindowServer APIs for below-window-layer compositing.

Language: Swift - Size: 507 KB - Last synced at: 19 days ago - Pushed at: about 1 month ago - Stars: 9 - Forks: 0

maltsev-andrey/julia_set_cuda

High-performance Julia set fractal computation in pure CUDA C, achieving 2.78 billion pixels/second on Tesla P100. Demonstrates GPU kernel programming, memory optimization, and massive parallelization (16M+ threads)."

Language: Cuda - Size: 1.3 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

ProgrammerGnome/CUDA-codes

Snippet repository for learning parallel GPU programming with CUDA.

Language: C++ - Size: 4.88 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

adamnemecek/awesome-metal

A collection of Metal and MetalKit projects and resources. Very much work in progress.

Size: 21.5 KB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 220 - Forks: 19

ivantag13/dist-GPU-accelerated-tree-search Fork of Guillaume-Helbecque/GPU-accelerated-tree-search-Chapel

Distributed GPU-accelerated tree search: Investigating a B&B algorithm based on a MPI+X (X=OpenMP, MPI, CUDA, HIP, etc) implementation

Language: C - Size: 664 KB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 1 - Forks: 0

Alan-Rock-GS/GpuScript

GpuScript allows you to write C# programs that run at supercomputer speeds on a single GPU. Learn it in 30 minutes. Write & debug large and complex projects specifically designed to run on the GPU.

Size: 424 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 199 - Forks: 20

cybersecurity-dev/awesome-gpu-programming

Awesome GPU Programming

Size: 12.7 KB - Last synced at: 4 days ago - Pushed at: 28 days ago - Stars: 1 - Forks: 0

EmbarkStudios/rust-gpu

πŸ‰ Making Rust a first-class language and ecosystem for GPU shaders 🚧

Language: Rust - Size: 248 MB - Last synced at: 28 days ago - Pushed at: about 1 year ago - Stars: 7,571 - Forks: 247

fabiocalabrese/HPC_Assignment Fork of Merlino2706/HPC_Assignment

Assignment for the HPC course 2025

Language: C - Size: 1020 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

AmesingFlank/taichi.js

Modern GPU Compute and Rendering in Javascript

Language: TypeScript - Size: 220 MB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 515 - Forks: 20

lucascogrossi/triton

Repository for learning Triton GPU programming

Language: Python - Size: 27.3 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

simar-rekhi/triton

LLM-assisted compiler pass generation with Triton & CUDA

Language: Jupyter Notebook - Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

NVIDIA/optix-dev

OptiX SDK headers, everything needed to build & run OptiX applications. SDK samples not included.

Language: C++ - Size: 186 KB - Last synced at: 9 days ago - Pushed at: 9 months ago - Stars: 35 - Forks: 2

YaccConstructor/Brahma.FSharp Fork of gsvgit/Brahma.FSharp

F# quotation to OpenCL translator and respective runtime to utilize GPGPUs in F# applications.

Language: F# - Size: 52.1 MB - Last synced at: 12 days ago - Pushed at: 8 months ago - Stars: 77 - Forks: 16

Mrezadwiprasetiawan/cpp-playground

A collection of C++ experiments and code created as part of exploration and practice

Language: C++ - Size: 21.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 1

shreyansh26/MLSys-Experiments

A collection of scripts on experimenting and implementing MLSys-related stuff

Language: Jupyter Notebook - Size: 83.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

vista-art/fragmentcolor

πŸ¦€ Easy GPU programming for Javascript, Python, Swift, and Kotlin.

Language: Rust - Size: 63.2 MB - Last synced at: about 16 hours ago - Pushed at: 23 days ago - Stars: 6 - Forks: 0

MetaMachines/mm-ptx-py

PTX Inject and Stack PTX for Python

Language: C - Size: 13.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

steaklive/EveryRay-Rendering-Engine

Robust real-time rendering engine on DX11, DX12 with many advanced graphical features for quick prototyping

Language: C++ - Size: 3.46 GB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 713 - Forks: 31

kevinyangjx/AkuaEngine

A real-time fluid simulation engine implemented in C++, with CUDA and OpenGL.

Language: C++ - Size: 24.7 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

andresnowak/PMPP-solutions

Solutions to the chapters of the Programming massively parallel processors 3rd and 4th edition edition book. (Some answers may be incorrect)

Language: Cuda - Size: 410 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Mantissagithub/edge_detection_gpu

GPU-accelerated Canny edge detector in CUDA C++. Parallelizes Gaussian filtering, gradient computation, non-maximum suppression, and hysteresis thresholding for real-time edge detection performance

Language: Cuda - Size: 4.49 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

nbathreya/CUDA-Signal-Processor

GPU-Accelerated Signal Processing

Language: Python - Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

abhiyanpaudel/parallel-highlife

High-performance CUDA, MPI, and Hybrid implementations demonstrating GPU computing and parallel programming.

Language: C - Size: 438 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

GpapPeaky/Basic-OpenGL

Basic OpenGL implementation for triangles, quads and textured quads

Language: C++ - Size: 42.4 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Young-TW/hippp

Write GPU program with RAII

Language: C++ - Size: 85.9 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

bfGraph/STGraph

🌟 Vertex Centric approach for building GNN/TGNNs

Language: Python - Size: 13.7 MB - Last synced at: 27 days ago - Pushed at: about 1 year ago - Stars: 23 - Forks: 0

lucidrains/triton-transformer

Implementation of a Transformer, but completely in Triton

Language: Python - Size: 34.3 MB - Last synced at: 27 days ago - Pushed at: over 3 years ago - Stars: 276 - Forks: 16

mikeroyal/Vulkan-Guide

Vulkan Guide

Language: C++ - Size: 43 KB - Last synced at: 7 days ago - Pushed at: almost 4 years ago - Stars: 30 - Forks: 2

taichi-dev/taichi

Productive, portable, and performant GPU programming in Python.

Language: C++ - Size: 57.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 27,581 - Forks: 2,363

AIComputing101/reinforcement-learning-101

An opinionated, end‑to‑end tutorial project for learning Reinforcement Learning (RL) from first principles to deployment. No notebooks. Everything is an explicit, inspectable Python script you can diff, profile, containerize, and ship.

Language: Python - Size: 222 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

abeleinin/Metal-Puzzles

Solve Puzzles. Learn Metal 🀘

Language: Jupyter Notebook - Size: 3.84 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 587 - Forks: 28

ParaGroup/WindFlow

A C++17 Data Stream Processing Parallel Library for Multicores and GPUs

Language: C++ - Size: 48.9 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 84 - Forks: 19

sudoDeVinci/skyDeVisionImager

Advanced environmental monitoring platform combining computer vision and geospatial analysis. Low-compute cloud detection, 3D terrain visualization from GeoTIFF data, multi-camera calibration, and statistical validation. scalable architecture with Flask web interface and SQLite backend.

Language: Python - Size: 20.8 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

LiteObject/CUDA-Image-Processing-App

Real-time GPU-accelerated image processing application using CUDA and Python. Features 11 visual filters including edge detection, blur, sepia, cartoon effects, and more - all running at 30 FPS with live webcam input.

Language: Python - Size: 62.5 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

AIComputing101/gpu-programming-101

A comprehensive hands-on project for learning GPU programming with CUDA and HIP, covering fundamental concepts through advanced optimization techniques.

Language: C++ - Size: 877 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 31 - Forks: 3

geomstats/geomstats

Computations and statistics on manifolds with geometric structures.

Language: Python - Size: 225 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1,403 - Forks: 268

ProjectPhysX/PTXprofiler

A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.

Language: C++ - Size: 11.7 KB - Last synced at: 27 days ago - Pushed at: 8 months ago - Stars: 56 - Forks: 6

wmmae/wmma_extension

An extension library of WMMA API (Tensor Core API)

Language: Cuda - Size: 698 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 106 - Forks: 16

coderonion/cuda-beginner-course-rust-version

bilibili视钑【CUDA 12.x εΉΆθ‘ŒηΌ–η¨‹ε…₯ι—¨(Rustη‰ˆ)】配ε₯—代码

Language: Rust - Size: 10.7 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

adriengivry/orhi

Cross-Platform Interface for Modern Graphics APIs (Vulkan, DirectX 12, Metal)

Language: C++ - Size: 1.5 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 75 - Forks: 3

hollance/metal-gpgpu

Collection of notes on how to use Apple’s Metal API for compute tasks

Size: 1000 Bytes - Last synced at: 27 days ago - Pushed at: over 7 years ago - Stars: 107 - Forks: 4

Herdora/kandc

The profiler that gives a unified view of your entire stack - from PyTorch down to GPU

Language: Python - Size: 22.5 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 88 - Forks: 9

Mgepahmge/CuWeaver

A CUDA concurrency library designed to simplify concurrency programming, offering C++-style wrappers for selected CUDA Runtime APIs

Language: Cuda - Size: 1.48 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 6 - Forks: 0

Awrsha/Advanced-CUDA-Programming-GPU-Architecture

This repository provides a comprehensive guide to optimizing GPU kernels for performance, with a focus on NVIDIA GPUs. It covers key tools and techniques such as CUDA, PyTorch, and Triton, aimed at improving computational efficiency for deep learning and scientific computing tasks.

Language: Cuda - Size: 25.2 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

xframes-project/xframes

GPU-accelerated GUI development for the desktop and the browser

Language: TypeScript - Size: 28.4 MB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 15 - Forks: 0

fastflow/fastflow

FastFlow pattern-based parallel programming framework (formerly on sourceforge)

Language: C++ - Size: 178 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 296 - Forks: 72

AmanSwar/KernelLab

collection of high-performance CUDA implementations, ranging from naive to highly optimized versions.

Language: Cuda - Size: 6.68 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

tgautam03/xFilters

GPU (CUDA) accelerated filters using 2D convolution for high resolution images.

Language: C++ - Size: 58.2 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 8 - Forks: 1

JuliaGPU/CuArrays.jl πŸ“¦

A Curious Cumulation of CUDA Cuisine

Language: Julia - Size: 2.16 MB - Last synced at: 28 days ago - Pushed at: over 5 years ago - Stars: 277 - Forks: 78

raghulrajn/UNET-on-GPU-using-OpenCL

Inference engine for UNET written in C++ for CPU and GPU

Language: C++ - Size: 29.2 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

StokastX/Nexus

An interactive GPU path tracer from scratch written in C++ using CUDA and OpenGL

Language: C++ - Size: 328 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 24 - Forks: 0

gpufit/Gpufit

GPU-accelerated Levenberg-Marquardt curve fitting in CUDA

Language: Cuda - Size: 1.14 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 332 - Forks: 99

i-Taylo/iUnlockerGL

iUnlocker GLTool is a Magisk module designed to spoof GPU information, allowing users to modify GPU informations for unlocking graphics in games and testing.

Language: Shell - Size: 145 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 33 - Forks: 0

JonSnow1807/FastMQA

CUDA implementation of Multi-Query Attention achieving 97% KV-cache memory reduction for LLM inference, enabling 32x larger batch sizes. Educational project demonstrating CUDA kernel development with PyTorch integration and Llama model benchmarks.

Language: Python - Size: 587 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

CRUXIV/FPSBOOSTER

This is a FPS booster that limits the background processes of windows. It makes the GPU more stable and optimized. FOR AMD AND NIVIDIA. It is meant for GAMING AND GENERAL USE AT A SMALL FILE SIZE!

Size: 3.06 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

andrewmilson/ministark

πŸƒβ€β™‚οΈπŸ’¨ GPU accelerated STARK prover built on @arkworks-rs

Language: Rust - Size: 1.65 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 365 - Forks: 36

YichengDWu/MoYe.jl

Programming Gemm Kernels on NVIDIA GPUs with Tensor Cores in Julia

Language: Julia - Size: 7.4 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 42 - Forks: 0

DmitryYurov/bitonic-cuda

An implementation of bitonic search on CUDA

Language: Cuda - Size: 39.1 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

LLAA178/LeetGPU-Guidebook

δΈ€ζ­₯ζ­₯ι€šε…³GPU编程

Size: 76.2 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

pnikitakis/high-performance-computing

5 problem sets of parallel programming on CPU and GPU. University projects for High Performance Computing Systems (Fall 2016).

Language: Cuda - Size: 1.06 MB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 0

NLeSC-COMPAS/kmm

KMM: parallel dataflow scheduler and efficient memory management for multi-GPU platforms

Language: C++ - Size: 8.34 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 1

jeffasante/metal-raymarch-rs

A basic 3D raymarcher built with Rust and Apple's Metal API. A learning project exploring SDF rendering.

Language: Rust - Size: 1020 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

elymsyr/auv_control_model

This repository implements an imitation learning pipeline for AUV control. It uses the "FossenNet" neural network to mimic an optimal NL-MPC policy and includes tools for data generation, training, and real-time C++ inference on GPUs.

Language: Jupyter Notebook - Size: 43.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

MeylandMan/Mabble

A cross-platform GPU backend library

Language: C++ - Size: 966 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

eomii/rules_ll

An Upstream Clang/LLVM-based toolchain for contemporary C++ and heterogeneous programming

Language: Starlark - Size: 3.96 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 93 - Forks: 10

benc-uk/webgl-sandbox

Interactive editor & sandbox for creating & running WebGL2 shaders

Language: JavaScript - Size: 4.71 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0

sagartv/cudalinreg_source

A GPU-Parallelised univariate Linear Regression Library ( N > 100k) written using CUDA C++ Kernels that can be installed as a Python Package.

Language: Python - Size: 26.4 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0