gpu-programming | Topic | Ecosyste.ms: Repos

Topic: "gpu-programming"

taichi-dev/taichi

Productive, portable, and performant GPU programming in Python.

Language: C++ - Size: 57.4 MB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 27,202 - Forks: 2,345

exaloop/codon

A high-performance, zero-overhead, extensible Python compiler with built-in NumPy support

Language: Python - Size: 6.56 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 15,733 - Forks: 541

plasma-umass/scalene

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals

Language: Python - Size: 14.1 MB - Last synced at: 19 days ago - Pushed at: about 1 month ago - Stars: 12,722 - Forks: 413

taskflow/taskflow

A General-purpose Task-parallel Programming System using Modern C++

Language: C++ - Size: 138 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 10,925 - Forks: 1,276

QianMo/Game-Programmer-Study-Notes

:anchor: 我的游戏程序员生涯的读书笔记合辑。你可以把它看作一个加强版的Blog。涉及图形学、实时渲染、编程实践、GPU编程、设计模式、软件工程等内容。Keep Reading , Keep Writing , Keep Coding.

Size: 752 MB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 9,412 - Forks: 1,722

EmbarkStudios/rust-gpu

🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧

Language: Rust - Size: 248 MB - Last synced at: 9 days ago - Pushed at: 7 months ago - Stars: 7,499 - Forks: 248

Rust-GPU/Rust-CUDA

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.

Language: Rust - Size: 6 MB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 4,490 - Forks: 191

uber/aresdb

A GPU-powered real-time analytics storage and query engine.

Language: Go - Size: 12.4 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 3,050 - Forks: 234

Rust-GPU/rust-gpu

🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧

Language: Rust - Size: 294 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,930 - Forks: 59

brucefan1983/CUDA-Programming

Sample codes for my CUDA programming book

Language: Cuda - Size: 9.13 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 1,712 - Forks: 347

NVIDIA/cccl

CUDA Core Compute Libraries

Language: C++ - Size: 82.9 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,706 - Forks: 227

calebwin/emu

The write-once-run-anywhere GPGPU library for Rust

Language: Rust - Size: 342 MB - Last synced at: about 18 hours ago - Pushed at: over 2 years ago - Stars: 1,610 - Forks: 52

geomstats/geomstats

Computations and statistics on manifolds with geometric structures.

Language: Python - Size: 211 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 1,351 - Forks: 261

QianMo/GPU-Gems-Book-Source-Code

:cd: CD Content ( Source Code ) Collection of Book <GPU Gems > 1~ 3 | 《GPU精粹》 1~ 3 随书CD（源代码）珍藏

Language: C++ - Size: 1.01 GB - Last synced at: 30 days ago - Pushed at: about 7 years ago - Stars: 1,075 - Forks: 448

QianMo/GPU-Pro-Books-Source-Code

:cd: Source Code Collection of Book <GPU Pro> 1~ 7 | 《GPU Pro》1~ 7 书本源代码珍藏

Language: GLSL - Size: 2.73 GB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 680 - Forks: 348

steaklive/EveryRay-Rendering-Engine

Robust real-time rendering engine on DX11, DX12 with many advanced graphical features for quick prototyping

Language: C++ - Size: 3.46 GB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 594 - Forks: 22

abeleinin/Metal-Puzzles

Solve Puzzles. Learn Metal 🤘

Language: Jupyter Notebook - Size: 3.84 MB - Last synced at: 5 months ago - Pushed at: 9 months ago - Stars: 505 - Forks: 22

AmesingFlank/taichi.js

Modern GPU Compute and Rendering in Javascript

Language: TypeScript - Size: 220 MB - Last synced at: 5 days ago - Pushed at: 12 months ago - Stars: 501 - Forks: 19

software-mansion/TypeGPU

TypeScript library that enhances the WebGPU API, allowing resource management in a type-safe, declarative way.

Language: TypeScript - Size: 89.4 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 493 - Forks: 10

Zielon/PBRVulkan

Vulkan Real-time Path Tracer Engine

Language: C++ - Size: 207 MB - Last synced at: 7 months ago - Pushed at: over 3 years ago - Stars: 488 - Forks: 37

OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cumbersome and the code overhead prevents many people from getting started. I created this lightweight OpenCL-Wrapper to greatly simplify OpenCL software development with C++ while keeping functionality and performance.

Language: C++ - Size: 344 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 406 - Forks: 40

andrewmilson/ministark

🏃‍♂️💨 GPU accelerated STARK prover built on @arkworks-rs

Language: Rust - Size: 1.65 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 357 - Forks: 36

Glavnokoman/vuh

Vulkan compute for people

Language: C++ - Size: 705 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 340 - Forks: 34

gpufit/Gpufit

GPU-accelerated Levenberg-Marquardt curve fitting in CUDA

Language: Cuda - Size: 1.16 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 319 - Forks: 96

JuliaGPU/AMDGPU.jl

AMD GPU (ROCm) programming in Julia

Language: Julia - Size: 11.8 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 310 - Forks: 58

fastflow/fastflow

FastFlow pattern-based parallel programming framework (formerly on sourceforge)

Language: C++ - Size: 136 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 292 - Forks: 70

JuliaGPU/CuArrays.jl 📦

A Curious Cumulation of CUDA Cuisine

Language: Julia - Size: 2.16 MB - Last synced at: 3 days ago - Pushed at: about 5 years ago - Stars: 277 - Forks: 80

lucidrains/triton-transformer

Implementation of a Transformer, but completely in Triton

Language: Python - Size: 34.3 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 265 - Forks: 16

nabla-ml/nabla

Composable Function Transformations in Python with Mojo/MAX acceleration

Language: Python - Size: 10.4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 254 - Forks: 7

stetre/moonlibs

Lua libraries for graphics and audio programming

Size: 842 KB - Last synced at: 7 months ago - Pushed at: about 2 years ago - Stars: 222 - Forks: 11

adamnemecek/awesome-metal

A collection of Metal and MetalKit projects and resources. Very much work in progress.

Size: 21.5 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 217 - Forks: 20

mikeroyal/GPU-Guide

Graphics Processing Unit (GPU) Architecture Guide

Language: Shell - Size: 815 KB - Last synced at: 12 days ago - Pushed at: over 3 years ago - Stars: 215 - Forks: 18

johannesugb/VolumetricLinesUnity

Source of the Volumetric Lines Asset from Unity's Asset Store

Language: C# - Size: 1.52 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 196 - Forks: 20

Alan-Rock-GS/GpuScript

GpuScript allows you to write C# programs that run at supercomputer speeds on a single GPU. Learn it in 30 minutes. Write & debug large and complex projects specifically designed to run on the GPU.

Size: 397 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 195 - Forks: 19

LanLou123/Webgl-Erosion

Interactive Erosion simulation in Web Browser

Language: TypeScript - Size: 955 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 186 - Forks: 25

jaredhoberock/stanford-cs193g-sp2010

This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010

Language: C++ - Size: 127 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 170 - Forks: 73

SamGinzburg/VectorVisor

VectorVisor is a vectorizing binary translator for GPUs, designed to make it easy to run many copies of a single-threaded WebAssembly program in parallel using GPUs

Language: WebAssembly - Size: 216 MB - Last synced at: 11 days ago - Pushed at: 9 months ago - Stars: 150 - Forks: 4

rAzoR8/SpvGenTwo

SpvGenTwo is a SPIR-V building and parsing library written in plain C++17 without any dependencies. No STL or other 3rd-Party library needed.

Language: C++ - Size: 1.93 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 146 - Forks: 13

ysh329/OpenCL-101

Learn OpenCL step by step.

Language: C - Size: 476 KB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 135 - Forks: 29

eedalong/ECE408

Code base and slides for ECE408：Applied Parallel Programming On GPU.

Language: C++ - Size: 35.6 MB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 122 - Forks: 34

tgautam03/xGeMM

Accelerated General (FP32) Matrix Multiplication from scratch in CUDA

Language: Cuda - Size: 5.8 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 115 - Forks: 7

Vincent-Therrien/gpu-arena

Compare and test GPU programming frameworks

Language: C++ - Size: 3.52 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 109 - Forks: 8

eyalroz/cuda-kat

CUDA kernel author's tools

Language: Cuda - Size: 1.57 MB - Last synced at: 8 months ago - Pushed at: about 3 years ago - Stars: 107 - Forks: 8

hollance/metal-gpgpu

Collection of notes on how to use Apple’s Metal API for compute tasks

Size: 1000 Bytes - Last synced at: 3 months ago - Pushed at: almost 7 years ago - Stars: 103 - Forks: 4

arctern-io/arctern

Language: C++ - Size: 66.6 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 102 - Forks: 53

Heteroflow/Heteroflow

Concurrent CPU-GPU Programming using Task Models

Language: C++ - Size: 1.58 MB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 101 - Forks: 13

wmmae/wmma_extension

An extension library of WMMA API (Tensor Core API)

Language: Cuda - Size: 698 KB - Last synced at: 6 days ago - Pushed at: 12 months ago - Stars: 99 - Forks: 15

phys-sim-book/solid-sim-tutorial-gpu

A curated set of C++ examples for optimization-based elastodynamic contact simulation using CUDA, emphasizing algorithmic convergence, penetration-free, and inversion-free conditions. Designed for readability and understanding, this tutorial helps beginners learn how to write simple GPU code for efficient solid simulations.

Language: Cuda - Size: 3.78 MB - Last synced at: about 13 hours ago - Pushed at: about 14 hours ago - Stars: 94 - Forks: 4

eomii/rules_ll

An Upstream Clang/LLVM-based toolchain for contemporary C++ and heterogeneous programming

Language: Starlark - Size: 3.96 MB - Last synced at: about 12 hours ago - Pushed at: about 13 hours ago - Stars: 92 - Forks: 10

ParaGroup/WindFlow

A C++17 Data Stream Processing Parallel Library for Multicores and GPUs

Language: C++ - Size: 48.9 MB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 84 - Forks: 19

YaccConstructor/Brahma.FSharp Fork of gsvgit/Brahma.FSharp

F# quotation to OpenCL translator and respective runtime to utilize GPGPUs in F# applications.

Language: F# - Size: 52.1 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 75 - Forks: 17

michel-meneses/great-opencl-examples

Collection of easy, well-documented and useful OpenCL examples in C++.

Language: C++ - Size: 1000 KB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 75 - Forks: 27

xmartlabs/cuda-calculator Fork of karthikeyann/cuda-calculator

Online CUDA Occupancy Calculator

Language: CoffeeScript - Size: 186 KB - Last synced at: 4 months ago - Pushed at: over 3 years ago - Stars: 74 - Forks: 12

helenl9098/Dynamic-Diffuse-Global-Illumination-Minecraft

DDGI Minecraft is based on the 2019 SIGGRAPH paper, Dynamic Diffuse Global Illumination with Ray-Traced Irradiance Fields (2019), where we aimed to approximate indirect lighting and global illumination in Minecraft-inspired scenes using Vulkan to test the algorithm's efficacy in real-time.

Language: C++ - Size: 138 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 73 - Forks: 12

unisa-hpc/sycl-bench

SYCL Benchmark Suite

Language: C++ - Size: 24.7 MB - Last synced at: 1 day ago - Pushed at: 8 days ago - Stars: 65 - Forks: 37

r-aristov/simba-ps

Fast deterministic all-Python Lennard-Jones particle simulator that utilizes Numba for GPU-accelerated computation.

Language: Python - Size: 84.9 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 65 - Forks: 5

ProjectPhysX/PTXprofiler

A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.

Language: C++ - Size: 11.7 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 50 - Forks: 6

andi611/Apriori-and-Eclat-Frequent-Itemset-Mining

Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.

Language: Python - Size: 4.05 MB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 48 - Forks: 19

Glavnokoman/vulkan-compute-example

Simple example of using Vulkan for GPGPU computing

Language: C++ - Size: 27.3 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 46 - Forks: 5

LuisaGroup/luisa-compute-rs

Rust frontend to LuisaCompute and more!

Language: Rust - Size: 2.4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 44 - Forks: 6

pengzhao-intel/oneAPI_course

oneAPI - Data Parallel C++ course for students

Language: C++ - Size: 108 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 42 - Forks: 10

YichengDWu/MoYe.jl

Programming Gemm Kernels on NVIDIA GPUs with Tensor Cores in Julia

Language: Julia - Size: 7.24 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 41 - Forks: 0

weissenberger/gpuhd

Massively Parallel Huffman Decoding on GPUs

Language: C++ - Size: 23.4 KB - Last synced at: 11 months ago - Pushed at: over 6 years ago - Stars: 40 - Forks: 14

alexfromapex/tensorexperiments

Boilerplate for GPU-Accelerated TensorFlow and PyTorch code on M1 Macbook

Language: Python - Size: 44.9 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 33 - Forks: 1

LLNL/CARE

CHAI and RAJA provide an excellent base on which to build portable codes. CARE expands that functionality, adding new features such as loop fusion capability and a portable interface for many numerical algorithms. It provides all the basics for anyone wanting to write portable code.

Language: C++ - Size: 1.47 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 30 - Forks: 4

coderonion/cuda-beginner-course-cpp-version

bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码

Language: Cuda - Size: 20.5 KB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 29 - Forks: 5

mikeroyal/Vulkan-Guide

Vulkan Guide

Language: C++ - Size: 43 KB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 28 - Forks: 2

NVIDIA/optix-dev

OptiX SDK headers, everything needed to build & run OptiX applications. SDK samples not included.

Language: C++ - Size: 186 KB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 27 - Forks: 2

weissenberger/multians

Massively Parallel ANS Decoding on GPUs

Language: C++ - Size: 29.3 KB - Last synced at: 11 months ago - Pushed at: almost 6 years ago - Stars: 26 - Forks: 4

yumcyaWiz/CEDEC-2024-RT

Code example for CEDEC 2024 "Easy Start with GPU Ray Tracing! From GPU Programming Basics to ReSTIR".

Language: C++ - Size: 157 MB - Last synced at: 13 days ago - Pushed at: 5 months ago - Stars: 24 - Forks: 5

bfGraph/STGraph

🌟 Vertex Centric approach for building GNN/TGNNs

Language: Python - Size: 13.7 MB - Last synced at: 15 days ago - Pushed at: 8 months ago - Stars: 22 - Forks: 0

KunyiLockeLin/AnemoneerEngine

Game Engine for Windows by Vulkan SDK

Language: C++ - Size: 571 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 21 - Forks: 0

acetinkaya/Nvdia-CUDA-Setup

NVIDIA GPU Kurulumu

Size: 49.8 KB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 20 - Forks: 0

AhmetFurkanDEMIR/NVIDIA-GPU-benchmark

NVIDIA GPU benchmark

Language: Jupyter Notebook - Size: 49.8 KB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 18 - Forks: 2

LanLou123/Fluid

OpenGL compute shader fluid

Language: C - Size: 60.5 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 18 - Forks: 4

dronelektron/MAI

Репозиторий лабораторных работ и КП 8-го факультета 806-й кафедры МАИ

Language: Java - Size: 21.5 MB - Last synced at: almost 2 years ago - Pushed at: almost 8 years ago - Stars: 18 - Forks: 19

StokastX/Nexus

An interactive GPU path tracer from scratch written in C++ using CUDA and OpenGL

Language: C++ - Size: 257 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 17 - Forks: 0

brucefan1983/GPUGA

Graphics Processing Units Genetic Algorithm

Language: Cuda - Size: 5.15 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 16 - Forks: 7

i-Taylo/iUnlockerGL

iUnlocker GLTool is a Magisk module designed to spoof GPU information, allowing users to modify GPU informations for unlocking graphics in games and testing.

Language: Shell - Size: 91.5 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 15 - Forks: 0

WenqiJiang/Convolution-Neural-Network-by-pyCUDA

pyCUDA implementation of forward propagation for Convolutional Neural Networks

Language: Python - Size: 995 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 15 - Forks: 2

xframes-project/xframes

GPU-accelerated GUI development for the desktop and the browser

Language: TypeScript - Size: 28.4 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 14 - Forks: 0

hannes-harnisch/Vitro

Experimental C++20 multiplatform graphics engine.

Language: C++ - Size: 48 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 1

Kapernikov/gpu-normal-computation

Performing normal computation for big point clouds on the gpu using openCL

Language: C++ - Size: 19.5 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 14 - Forks: 4

tgautam03/tGeMM

General Matrix Multiplication using NVIDIA Tensor Cores

Language: Cuda - Size: 47.9 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 13 - Forks: 3

munstermonster/cuSten

CUDA Finite Difference Library

Language: Cuda - Size: 903 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 13 - Forks: 4

ShadyBoukhary/GPU-research-FFT-OpenACC-CUDA

Case studies constitute a modern interdisciplinary and valuable teaching practice which plays a critical and fundamental role in the development of new skills and the formation of new knowledge. This research studies the behavior and performance of two interdisciplinary and widely adopted scientific kernels, a Fast Fourier Transform and Matrix Multiplication. Both routines are implemented in the two current most popular many-core programming models CUDA and OpenACC. A Fast Fourier Transform (FFT) samples a signal over a period of time and divides it into its frequency components, computing the Discrete Fourier Transform (DFT) of a sequence. Unlike the traditional approach to computing a DFT, FFT algorithms reduce the complexity of the problem from O(n2) to O(nLog2n). Matrix multiplication is a cornerstone routine in Mathematics, Artificial Intelligence and Machine Learning. This research also shows that the nature of the problem plays a crucial role in determining what many-core model will provide the highest benefit in performance.

Language: Cuda - Size: 9.12 MB - Last synced at: 2 months ago - Pushed at: almost 7 years ago - Stars: 13 - Forks: 3