GitHub topics: gpu-programming
jrajan14/CUDA_Programs
Nvidia CUDA Programs. High-performance computing with my collection of CUDA programs, meticulously crafted to harness the immense power of NVIDIA's GPU architecture. From blazingly fast simulations to data-intensive parallel processing, these programs showcase my passion for pushing the boundaries of performance optimization.
Language: Cuda - Size: 30.8 MB - Last synced at: about 23 hours ago - Pushed at: about 24 hours ago - Stars: 5 - Forks: 2

JuliaGPU/AMDGPU.jl
AMD GPU (ROCm) programming in Julia
Language: Julia - Size: 12.4 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 308 - Forks: 58

nabla-ml/nabla
Dynamic Neural Networks and Function Transformations in Python + Mojo
Language: Mojo - Size: 40.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 247 - Forks: 7

Rust-GPU/rust-gpu
🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧
Language: Rust - Size: 292 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,864 - Forks: 57

Misteri4452y/taskflow
Smart weekly planner with auto-scheduling and Google Calendar integration
Language: Python - Size: 31.3 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

Alan-Rock-GS/GpuScript
GpuScript allows you to write C# programs that run at supercomputer speeds on a single GPU. Learn it in 30 minutes. Write & debug large and complex projects specifically designed to run on the GPU.
Size: 379 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 195 - Forks: 19

NVIDIA/cccl
CUDA Core Compute Libraries
Language: C++ - Size: 81.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,667 - Forks: 220

software-mansion/TypeGPU
TypeScript library that enhances the WebGPU API, allowing resource management in a type-safe, declarative way.
Language: TypeScript - Size: 81.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 436 - Forks: 11

romitjain/learning-gpu-programming
Learnings and experimentation with GPU programming
Language: Cuda - Size: 398 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

shreyansh26/MLSys-Experiments
A collection of scripts on experimenting and implementing MLSys-related stuff
Language: Jupyter Notebook - Size: 78.4 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 2 - Forks: 0

taskflow/taskflow
A General-purpose Task-parallel Programming System using Modern C++
Language: C++ - Size: 138 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 10,911 - Forks: 1,275

exaloop/codon
A high-performance, zero-overhead, extensible Python compiler with built-in NumPy support
Language: Python - Size: 6.5 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 15,706 - Forks: 538

Rontim/GPU-Parallel-Processing-AI
This repository explores the use of GPU parallel processing in the context of Artificial Intelligence (AI), specifically leveraging GPUs for accelerating computations in deep learning tasks.
Language: Jupyter Notebook - Size: 56.7 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

Aelstraz/Unity-GPU-Compute
GPU Compute provides an easy way to setup & execute GPU compute shaders in Unity. Create and manage buffers, track GPU memory usage & execution time, automatically calculate thread group sizes & buffer strides- all in one class.
Language: C# - Size: 60.5 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

Rust-GPU/Rust-CUDA
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
Language: Rust - Size: 6 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 4,409 - Forks: 183

coderonion/cuda-beginner-course-cpp-version
bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码
Language: Cuda - Size: 20.5 KB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 29 - Forks: 5

S-M-J-I/GPU-programming
GPU programming and tensara problems
Language: Python - Size: 3.91 KB - Last synced at: 1 day ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

rishabhkarnwal04/CNN
Deep CNN with Visualization for CIFAR-10 A neural network project that classifies CIFAR-10 images using deep CNNs built with NumPy and PyTorch/Keras. Includes filter visualizations, animated predictions, and performance tracking — ideal for learning how CNNs interpret visual data.
Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

AdroitAnandAI/Parallel-RNG-using-GPU
Parallel implementation of inherently sequential algorithms using mathematical hacks. Random Number Generators - Additive LFG and GFSR - implemented with NVIDIA CUDA using Continuous Subsequence Technique and Leap Frog Technique
Language: Cuda - Size: 3.27 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

AmesingFlank/taichi.js
Modern GPU Compute and Rendering in Javascript
Language: TypeScript - Size: 220 MB - Last synced at: 9 days ago - Pushed at: 11 months ago - Stars: 496 - Forks: 19

lucidrains/triton-transformer
Implementation of a Transformer, but completely in Triton
Language: Python - Size: 34.3 MB - Last synced at: 13 days ago - Pushed at: about 3 years ago - Stars: 265 - Forks: 16

jeffasante/metal-raymarch-rs
A basic 3D raymarcher built with Rust and Apple's Metal API. A learning project exploring SDF rendering.
Language: Rust - Size: 1020 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

Young-TW/hippp
Write GPU program with RAII
Language: C++ - Size: 44.9 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

eomii/rules_ll
An Upstream Clang/LLVM-based toolchain for contemporary C++ and heterogeneous programming
Language: Starlark - Size: 3.96 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 91 - Forks: 10

uber/aresdb
A GPU-powered real-time analytics storage and query engine.
Language: Go - Size: 12.4 MB - Last synced at: 10 days ago - Pushed at: 11 months ago - Stars: 3,050 - Forks: 234

calebwin/emu
The write-once-run-anywhere GPGPU library for Rust
Language: Rust - Size: 342 MB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 1,610 - Forks: 52

arminkz/VulkanEngine
Vulkan boilerplate / examples
Language: C++ - Size: 180 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

dipta007/gpu-wait
A package to run commands when GPU resources are available
Language: Python - Size: 21.5 KB - Last synced at: about 16 hours ago - Pushed at: 5 months ago - Stars: 3 - Forks: 0

PerHuepenbecker/Cudyn
CUDA library for irregular tasks using a dynamic block-internal balancing mechanism
Language: Cuda - Size: 44.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

ProjectPhysX/OpenCL-Wrapper
OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cumbersome and the code overhead prevents many people from getting started. I created this lightweight OpenCL-Wrapper to greatly simplify OpenCL software development with C++ while keeping functionality and performance.
Language: C++ - Size: 324 KB - Last synced at: 15 days ago - Pushed at: 22 days ago - Stars: 401 - Forks: 40

taichi-dev/taichi
Productive, portable, and performant GPU programming in Python.
Language: C++ - Size: 57.4 MB - Last synced at: 19 days ago - Pushed at: about 1 month ago - Stars: 27,112 - Forks: 2,341

wmmae/wmma_extension
An extension library of WMMA API (Tensor Core API)
Language: Cuda - Size: 698 KB - Last synced at: about 8 hours ago - Pushed at: 11 months ago - Stars: 97 - Forks: 15

EmbarkStudios/rust-gpu
🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧
Language: Rust - Size: 248 MB - Last synced at: 19 days ago - Pushed at: 7 months ago - Stars: 7,488 - Forks: 250

geomstats/geomstats
Computations and statistics on manifolds with geometric structures.
Language: Python - Size: 211 MB - Last synced at: 18 days ago - Pushed at: 3 months ago - Stars: 1,351 - Forks: 261

i-Taylo/iUnlockerGL
iUnlocker GLTool is a Magisk module designed to spoof GPU information, allowing users to modify GPU informations for unlocking graphics in games and testing.
Language: Shell - Size: 91.5 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 15 - Forks: 0

Nicolas-Ferre/wgso
WebGPU Shader Orchestrator to create GPU-native applications
Language: Rust - Size: 198 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

rbga/A51-Realtime-AI-Object-Detection-with-Pyglet-Powered-UI
Real-time object detection app using YOLOv5/YOLOv8 with custom UI built from scratch using Pyglet & OpenGL. UI animations made in Adobe After Effects, rendered as GIFs, and integrated via uxElements.py. Multi-core processing enables live capture, detection, and display with low latency. Uses Open Images v7 dataset. Train mode is WIP.
Language: Python - Size: 137 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 1 - Forks: 0

luciianaoliveira/cua
c/ua is the Docker Container for Computer-Use AI Agents.
Language: Python - Size: 4.88 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

tgautam03/xGeMM
Accelerated General (FP32) Matrix Multiplication from scratch in CUDA
Language: Cuda - Size: 5.8 MB - Last synced at: 13 days ago - Pushed at: 5 months ago - Stars: 115 - Forks: 7

adamnemecek/awesome-metal
A collection of Metal and MetalKit projects and resources. Very much work in progress.
Size: 21.5 KB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 215 - Forks: 20

akileshas/gpuX
100 days of GPU programming !!!
Size: 31.6 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

QianMo/GPU-Gems-Book-Source-Code
:cd: CD Content ( Source Code ) Collection of Book <GPU Gems > 1~ 3 | 《GPU精粹》 1~ 3 随书CD(源代码)珍藏
Language: C++ - Size: 1.01 GB - Last synced at: 8 days ago - Pushed at: about 7 years ago - Stars: 1,075 - Forks: 448

brucefan1983/CUDA-Programming
Sample codes for my CUDA programming book
Language: Cuda - Size: 9.13 MB - Last synced at: 25 days ago - Pushed at: 4 months ago - Stars: 1,712 - Forks: 347

plasma-umass/scalene
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Language: Python - Size: 14.1 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 12,651 - Forks: 408

NVIDIA/optix-dev
OptiX SDK headers, everything needed to build & run OptiX applications. SDK samples not included.
Language: C++ - Size: 186 KB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 24 - Forks: 2

raghulrajn/UNET-on-GPU-using-OpenCL
High performance programming of GPU using OpenCL
Language: C++ - Size: 29.2 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

andrewmilson/ministark
🏃♂️💨 GPU accelerated STARK prover built on @arkworks-rs
Language: Rust - Size: 1.65 MB - Last synced at: 17 days ago - Pushed at: 7 months ago - Stars: 357 - Forks: 36

maya-undefined/gpu-desktop-calculator
Language: Cuda - Size: 48.8 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 0

YichengDWu/MoYe.jl
Programming Gemm Kernels on NVIDIA GPUs with Tensor Cores in Julia
Language: Julia - Size: 7.24 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 41 - Forks: 0

johannesugb/VolumetricLinesUnity
Source of the Volumetric Lines Asset from Unity's Asset Store
Language: C# - Size: 1.52 MB - Last synced at: 27 days ago - Pushed at: over 3 years ago - Stars: 196 - Forks: 20

mikeroyal/GPU-Guide
Graphics Processing Unit (GPU) Architecture Guide
Language: Shell - Size: 815 KB - Last synced at: 27 days ago - Pushed at: over 3 years ago - Stars: 203 - Forks: 16

fastflow/fastflow
FastFlow pattern-based parallel programming framework (formerly on sourceforge)
Language: C++ - Size: 136 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 292 - Forks: 70

GameWin221/Gemino
⚡High-Performance Vulkan Renderer🌋
Language: C++ - Size: 8.66 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

NLeSC-COMPAS/kmm
KMM: parallel dataflow scheduler and efficient memory management for multi-GPU platforms
Language: C++ - Size: 7.34 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 1

QianMo/GPU-Pro-Books-Source-Code
:cd: Source Code Collection of Book <GPU Pro> 1~ 7 | 《GPU Pro》1~ 7 书本源代码珍藏
Language: GLSL - Size: 2.73 GB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 680 - Forks: 348

pjyi2147/CUDA_HTN_Workshop
Introduction to Nvidia CUDA workshop repository @ Hack the North 2024
Language: Jupyter Notebook - Size: 8.47 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 2

Vincent-Therrien/gpu-arena
Compare and test GPU programming frameworks
Language: C++ - Size: 3.52 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 109 - Forks: 8

kartavyaantani/CUDA_IMAGE_PROCESSING
A CUDA-accelerated image processing project featuring multiple GPU-based filters and enhancement techniques. Implements convolution, edge detection, Non-Local Means (NLM) denoising, K-Nearest Neighbors (KNN), and pixelization. Each operation is optimized using CUDA kernels for real-time performance on large images. The project supports command-line
Language: Jupyter Notebook - Size: 5.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

palapav/triton-compute-kernels
A collection of Triton compute kernels for common ML operations
Size: 3.91 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

yashkathe/Image-Noise-Reduction-with-CUDA
This project conducts an analysis of image denoising technique - median blur, comparing GPU-accelerated (Numba) and CPU-based (OpenCV) processing speeds.
Language: Jupyter Notebook - Size: 25.4 MB - Last synced at: 25 days ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

LLNL/CARE
CHAI and RAJA provide an excellent base on which to build portable codes. CARE expands that functionality, adding new features such as loop fusion capability and a portable interface for many numerical algorithms. It provides all the basics for anyone wanting to write portable code.
Language: C++ - Size: 1.47 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 30 - Forks: 4

aryagxr/cuda
100 Days of CUDA!!!
Language: Cuda - Size: 120 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 5 - Forks: 0

aditiisaxena/CUDA-Accelerated-Box-Filter-for-Texture-Image-Enhancement
Enhances grayscale texture images using a CUDA-based box filter. Built with CUDA, C++14, and OpenCV for high-performance image processing.
Language: Cuda - Size: 65.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

bfGraph/STGraph
🌟 Vertex Centric approach for building GNN/TGNNs
Language: Python - Size: 13.7 MB - Last synced at: 25 days ago - Pushed at: 7 months ago - Stars: 22 - Forks: 0

eedalong/ECE408
Code base and slides for ECE408:Applied Parallel Programming On GPU.
Language: C++ - Size: 35.6 MB - Last synced at: about 2 months ago - Pushed at: almost 4 years ago - Stars: 122 - Forks: 34

AlfonsoLRz/LiDAR_BRDF
Source code of "Enhancing LiDAR point cloud generation with BRDF-based appearance modelling" (yet to be published).
Size: 18.6 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

NielsOuvrard/metal-sand-box
Metal graphics experiments based on 'Metal by Tutorials'
Language: Swift - Size: 16.5 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

MolSSI-Education/gpu_programming_beginner
Fundamentals of heterogeneous parallel programming with CUDA C/C++ at the beginner level.
Language: Python - Size: 5.25 MB - Last synced at: 15 days ago - Pushed at: about 2 years ago - Stars: 11 - Forks: 2

Nicolas-Ferre/ragna
A Rust library for easily creating GPU-native applications
Language: Rust - Size: 197 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

hollance/metal-gpgpu
Collection of notes on how to use Apple’s Metal API for compute tasks
Size: 1000 Bytes - Last synced at: 2 months ago - Pushed at: almost 7 years ago - Stars: 103 - Forks: 4

tgautam03/tGeMM
General Matrix Multiplication using NVIDIA Tensor Cores
Language: Cuda - Size: 47.9 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 13 - Forks: 3

ysh329/OpenCL-101
Learn OpenCL step by step.
Language: C - Size: 476 KB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 135 - Forks: 29

tgautam03/xFilters
GPU (CUDA) accelerated filters using 2D convolution for high resolution images.
Language: C++ - Size: 58.2 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 6 - Forks: 1

ankhoa1212/cuda-program
This is a GPU program built with CUDA using parallel reduction
Language: C - Size: 13.8 MB - Last synced at: 21 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Awrsha/Advanced-CUDA-Programming-GPU-Architecture
This repository provides a comprehensive guide to optimizing GPU kernels for performance, with a focus on NVIDIA GPUs. It covers key tools and techniques such as CUDA, PyTorch, and Triton, aimed at improving computational efficiency for deep learning and scientific computing tasks.
Language: Cuda - Size: 25.2 MB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

mikeroyal/Vulkan-Guide
Vulkan Guide
Language: C++ - Size: 43 KB - Last synced at: 25 days ago - Pushed at: over 3 years ago - Stars: 28 - Forks: 2

ProjectPhysX/PTXprofiler
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
Language: C++ - Size: 11.7 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 50 - Forks: 6

QianMo/Game-Programmer-Study-Notes
:anchor: 我的游戏程序员生涯的读书笔记合辑。你可以把它看作一个加强版的Blog。涉及图形学、实时渲染、编程实践、GPU编程、设计模式、软件工程等内容。Keep Reading , Keep Writing , Keep Coding.
Size: 752 MB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 9,412 - Forks: 1,722

vista-art/fragmentcolor
🦀 Easy GPU programming for Javascript, Python, Swift, and Kotlin.
Language: Rust - Size: 49.2 MB - Last synced at: 11 days ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0

unisa-hpc/sycl-bench
SYCL Benchmark Suite
Language: C++ - Size: 24.7 MB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 64 - Forks: 35

michel-meneses/great-opencl-examples
Collection of easy, well-documented and useful OpenCL examples in C++.
Language: C++ - Size: 1000 KB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 75 - Forks: 27

itslokesh/Multi-Max-Clique
Multi-Max-Clique, an application that solves Maximum Clique Problem using the parallel branch and bound approach and achieved linear and super-linear speedups in CUDA.
Language: Cuda - Size: 829 KB - Last synced at: 4 days ago - Pushed at: almost 7 years ago - Stars: 4 - Forks: 0

Heteroflow/Heteroflow
Concurrent CPU-GPU Programming using Task Models
Language: C++ - Size: 1.58 MB - Last synced at: 2 months ago - Pushed at: over 5 years ago - Stars: 101 - Forks: 13

xframes-project/xframes
GPU-accelerated GUI development for the desktop and the browser
Language: TypeScript - Size: 28.4 MB - Last synced at: 11 days ago - Pushed at: 5 months ago - Stars: 13 - Forks: 0

jayeshthk/Parallel_Computing Fork of ShashankDavalgi/Parallel_Computing
CUDA computing example repo. with complex matrix multiplication.
Language: C - Size: 14.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Elsword016/100days_Triton
Learning triton and GPU acceleration from scratch
Language: Jupyter Notebook - Size: 1.53 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

coderonion/cuda-beginner-course-rust-version
bilibili视频【CUDA 12.x 并行编程入门(Rust版)】配套代码
Language: Rust - Size: 10.7 KB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 6 - Forks: 0

ParaGroup/WindFlow
A C++17 Data Stream Processing Parallel Library for Multicores and GPUs
Language: C++ - Size: 48.9 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 81 - Forks: 19

DannyDoesGraphics/DARE
Danny's Awesome Rendering Engine
Language: Rust - Size: 4.52 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

machineko/SwiftCU
SwiftCU is a wrapper for CUDA runtime API's (exposed as cxxCU) with extra utilities for device management, memory ops and kernel execution, along with a robust suite of tests. Repo is tested on newest (v12.5) CUDA runtime API on both Linux and Windows.
Language: Swift - Size: 613 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

gpufit/Gpufit
GPU-accelerated Levenberg-Marquardt curve fitting in CUDA
Language: Cuda - Size: 1.16 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 319 - Forks: 96

ShadyBoukhary/GPU-research-FFT-OpenACC-CUDA
Case studies constitute a modern interdisciplinary and valuable teaching practice which plays a critical and fundamental role in the development of new skills and the formation of new knowledge. This research studies the behavior and performance of two interdisciplinary and widely adopted scientific kernels, a Fast Fourier Transform and Matrix Multiplication. Both routines are implemented in the two current most popular many-core programming models CUDA and OpenACC. A Fast Fourier Transform (FFT) samples a signal over a period of time and divides it into its frequency components, computing the Discrete Fourier Transform (DFT) of a sequence. Unlike the traditional approach to computing a DFT, FFT algorithms reduce the complexity of the problem from O(n2) to O(nLog2n). Matrix multiplication is a cornerstone routine in Mathematics, Artificial Intelligence and Machine Learning. This research also shows that the nature of the problem plays a crucial role in determining what many-core model will provide the highest benefit in performance.
Language: Cuda - Size: 9.12 MB - Last synced at: about 2 months ago - Pushed at: almost 7 years ago - Stars: 13 - Forks: 3

anselm67/CUDA_mnist
A CUDA implementation of MNIST - for CUDA beginners.
Language: Cuda - Size: 19.5 KB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

AlexJMercer/CUDA-NPP-Assignment
Learning about CUDA and NVIDIA Performance Primitives. Part of Coursera Assignment.
Language: C++ - Size: 9.49 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

DanHouseman/AdaptiveSort
C# Extention methods for super efficient sorting using CPU, GPU, and FPGA
Language: C# - Size: 33.2 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

dj-himp/DX11GPUParticles
A fully gpu particle system with Directx 11
Language: C++ - Size: 240 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 9 - Forks: 1

andi611/Apriori-and-Eclat-Frequent-Itemset-Mining
Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.
Language: Python - Size: 4.05 MB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 48 - Forks: 19

YaccConstructor/Brahma.FSharp Fork of gsvgit/Brahma.FSharp
F# quotation to OpenCL translator and respective runtime to utilize GPGPUs in F# applications.
Language: F# - Size: 52.1 MB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 75 - Forks: 17

Shapur1234/Fractl
Fractal renderer written in rust supporting multithreading, gpu compute and wasm
Language: Rust - Size: 43.7 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

abeleinin/Metal-Puzzles
Solve Puzzles. Learn Metal 🤘
Language: Jupyter Notebook - Size: 3.84 MB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 505 - Forks: 22
