GitHub topics: gpu-programming

Repositories

jrajan14/CUDA_Programs

Nvidia CUDA Programs. High-performance computing with my collection of CUDA programs, meticulously crafted to harness the immense power of NVIDIA's GPU architecture. From blazingly fast simulations to data-intensive parallel processing, these programs showcase my passion for pushing the boundaries of performance optimization.

Language: Cuda - Size: 30.8 MB - Last synced at: about 23 hours ago - Pushed at: about 24 hours ago - Stars: 5 - Forks: 2

JuliaGPU/AMDGPU.jl

AMD GPU (ROCm) programming in Julia

Language: Julia - Size: 12.4 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 308 - Forks: 58

nabla-ml/nabla

Dynamic Neural Networks and Function Transformations in Python + Mojo

Language: Mojo - Size: 40.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 247 - Forks: 7

Rust-GPU/rust-gpu

🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧

Language: Rust - Size: 292 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,864 - Forks: 57

Misteri4452y/taskflow

Smart weekly planner with auto-scheduling and Google Calendar integration

Language: Python - Size: 31.3 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

Alan-Rock-GS/GpuScript

GpuScript allows you to write C# programs that run at supercomputer speeds on a single GPU. Learn it in 30 minutes. Write & debug large and complex projects specifically designed to run on the GPU.

Size: 379 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 195 - Forks: 19

NVIDIA/cccl

CUDA Core Compute Libraries

Language: C++ - Size: 81.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,667 - Forks: 220

software-mansion/TypeGPU

TypeScript library that enhances the WebGPU API, allowing resource management in a type-safe, declarative way.

Language: TypeScript - Size: 81.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 436 - Forks: 11

romitjain/learning-gpu-programming

Learnings and experimentation with GPU programming

Language: Cuda - Size: 398 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

shreyansh26/MLSys-Experiments

A collection of scripts on experimenting and implementing MLSys-related stuff

Language: Jupyter Notebook - Size: 78.4 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 2 - Forks: 0

taskflow/taskflow

A General-purpose Task-parallel Programming System using Modern C++

Language: C++ - Size: 138 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 10,911 - Forks: 1,275

exaloop/codon

A high-performance, zero-overhead, extensible Python compiler with built-in NumPy support

Language: Python - Size: 6.5 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 15,706 - Forks: 538

Rontim/GPU-Parallel-Processing-AI

This repository explores the use of GPU parallel processing in the context of Artificial Intelligence (AI), specifically leveraging GPUs for accelerating computations in deep learning tasks.

Language: Jupyter Notebook - Size: 56.7 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

Aelstraz/Unity-GPU-Compute

GPU Compute provides an easy way to setup & execute GPU compute shaders in Unity. Create and manage buffers, track GPU memory usage & execution time, automatically calculate thread group sizes & buffer strides- all in one class.

Language: C# - Size: 60.5 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

Rust-GPU/Rust-CUDA

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.

Language: Rust - Size: 6 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 4,409 - Forks: 183

coderonion/cuda-beginner-course-cpp-version

bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码

Language: Cuda - Size: 20.5 KB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 29 - Forks: 5

S-M-J-I/GPU-programming

GPU programming and tensara problems

Language: Python - Size: 3.91 KB - Last synced at: 1 day ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

rishabhkarnwal04/CNN

Deep CNN with Visualization for CIFAR-10 A neural network project that classifies CIFAR-10 images using deep CNNs built with NumPy and PyTorch/Keras. Includes filter visualizations, animated predictions, and performance tracking — ideal for learning how CNNs interpret visual data.

Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

AdroitAnandAI/Parallel-RNG-using-GPU

Parallel implementation of inherently sequential algorithms using mathematical hacks. Random Number Generators - Additive LFG and GFSR - implemented with NVIDIA CUDA using Continuous Subsequence Technique and Leap Frog Technique

Language: Cuda - Size: 3.27 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

AmesingFlank/taichi.js

Modern GPU Compute and Rendering in Javascript

Language: TypeScript - Size: 220 MB - Last synced at: 9 days ago - Pushed at: 11 months ago - Stars: 496 - Forks: 19

lucidrains/triton-transformer

Implementation of a Transformer, but completely in Triton

Language: Python - Size: 34.3 MB - Last synced at: 13 days ago - Pushed at: about 3 years ago - Stars: 265 - Forks: 16

jeffasante/metal-raymarch-rs

A basic 3D raymarcher built with Rust and Apple's Metal API. A learning project exploring SDF rendering.

Language: Rust - Size: 1020 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

Young-TW/hippp

Write GPU program with RAII

Language: C++ - Size: 44.9 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

eomii/rules_ll

An Upstream Clang/LLVM-based toolchain for contemporary C++ and heterogeneous programming

Language: Starlark - Size: 3.96 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 91 - Forks: 10

uber/aresdb

A GPU-powered real-time analytics storage and query engine.

Language: Go - Size: 12.4 MB - Last synced at: 10 days ago - Pushed at: 11 months ago - Stars: 3,050 - Forks: 234

calebwin/emu

The write-once-run-anywhere GPGPU library for Rust

Language: Rust - Size: 342 MB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 1,610 - Forks: 52

arminkz/VulkanEngine

Vulkan boilerplate / examples

Language: C++ - Size: 180 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

dipta007/gpu-wait

A package to run commands when GPU resources are available

Language: Python - Size: 21.5 KB - Last synced at: about 16 hours ago - Pushed at: 5 months ago - Stars: 3 - Forks: 0

PerHuepenbecker/Cudyn

CUDA library for irregular tasks using a dynamic block-internal balancing mechanism

Language: Cuda - Size: 44.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

ProjectPhysX/OpenCL-Wrapper

OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cumbersome and the code overhead prevents many people from getting started. I created this lightweight OpenCL-Wrapper to greatly simplify OpenCL software development with C++ while keeping functionality and performance.

Language: C++ - Size: 324 KB - Last synced at: 15 days ago - Pushed at: 22 days ago - Stars: 401 - Forks: 40

taichi-dev/taichi

Productive, portable, and performant GPU programming in Python.

Language: C++ - Size: 57.4 MB - Last synced at: 19 days ago - Pushed at: about 1 month ago - Stars: 27,112 - Forks: 2,341

wmmae/wmma_extension

An extension library of WMMA API (Tensor Core API)

Language: Cuda - Size: 698 KB - Last synced at: about 8 hours ago - Pushed at: 11 months ago - Stars: 97 - Forks: 15

EmbarkStudios/rust-gpu

🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧

Language: Rust - Size: 248 MB - Last synced at: 19 days ago - Pushed at: 7 months ago - Stars: 7,488 - Forks: 250

geomstats/geomstats

Computations and statistics on manifolds with geometric structures.

Language: Python - Size: 211 MB - Last synced at: 18 days ago - Pushed at: 3 months ago - Stars: 1,351 - Forks: 261

i-Taylo/iUnlockerGL

iUnlocker GLTool is a Magisk module designed to spoof GPU information, allowing users to modify GPU informations for unlocking graphics in games and testing.

Language: Shell - Size: 91.5 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 15 - Forks: 0

Nicolas-Ferre/wgso

WebGPU Shader Orchestrator to create GPU-native applications

Language: Rust - Size: 198 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

rbga/A51-Realtime-AI-Object-Detection-with-Pyglet-Powered-UI

Real-time object detection app using YOLOv5/YOLOv8 with custom UI built from scratch using Pyglet & OpenGL. UI animations made in Adobe After Effects, rendered as GIFs, and integrated via uxElements.py. Multi-core processing enables live capture, detection, and display with low latency. Uses Open Images v7 dataset. Train mode is WIP.

Language: Python - Size: 137 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 1 - Forks: 0

luciianaoliveira/cua

c/ua is the Docker Container for Computer-Use AI Agents.

Language: Python - Size: 4.88 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

tgautam03/xGeMM

Accelerated General (FP32) Matrix Multiplication from scratch in CUDA

Language: Cuda - Size: 5.8 MB - Last synced at: 13 days ago - Pushed at: 5 months ago - Stars: 115 - Forks: 7

adamnemecek/awesome-metal

A collection of Metal and MetalKit projects and resources. Very much work in progress.

Size: 21.5 KB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 215 - Forks: 20

akileshas/gpuX

100 days of GPU programming !!!

Size: 31.6 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

QianMo/GPU-Gems-Book-Source-Code

:cd: CD Content ( Source Code ) Collection of Book <GPU Gems > 1~ 3 | 《GPU精粹》 1~ 3 随书CD（源代码）珍藏

Language: C++ - Size: 1.01 GB - Last synced at: 8 days ago - Pushed at: about 7 years ago - Stars: 1,075 - Forks: 448

brucefan1983/CUDA-Programming

Sample codes for my CUDA programming book

Language: Cuda - Size: 9.13 MB - Last synced at: 25 days ago - Pushed at: 4 months ago - Stars: 1,712 - Forks: 347

plasma-umass/scalene

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals

Language: Python - Size: 14.1 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 12,651 - Forks: 408

NVIDIA/optix-dev

OptiX SDK headers, everything needed to build & run OptiX applications. SDK samples not included.

Language: C++ - Size: 186 KB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 24 - Forks: 2

raghulrajn/UNET-on-GPU-using-OpenCL

High performance programming of GPU using OpenCL

Language: C++ - Size: 29.2 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

andrewmilson/ministark

🏃‍♂️💨 GPU accelerated STARK prover built on @arkworks-rs

Language: Rust - Size: 1.65 MB - Last synced at: 17 days ago - Pushed at: 7 months ago - Stars: 357 - Forks: 36

maya-undefined/gpu-desktop-calculator

Language: Cuda - Size: 48.8 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 0

YichengDWu/MoYe.jl

Programming Gemm Kernels on NVIDIA GPUs with Tensor Cores in Julia

Language: Julia - Size: 7.24 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 41 - Forks: 0

johannesugb/VolumetricLinesUnity

Source of the Volumetric Lines Asset from Unity's Asset Store

Language: C# - Size: 1.52 MB - Last synced at: 27 days ago - Pushed at: over 3 years ago - Stars: 196 - Forks: 20

mikeroyal/GPU-Guide

Graphics Processing Unit (GPU) Architecture Guide

Language: Shell - Size: 815 KB - Last synced at: 27 days ago - Pushed at: over 3 years ago - Stars: 203 - Forks: 16

fastflow/fastflow

FastFlow pattern-based parallel programming framework (formerly on sourceforge)

Language: C++ - Size: 136 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 292 - Forks: 70

GameWin221/Gemino

⚡High-Performance Vulkan Renderer🌋

Language: C++ - Size: 8.66 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

NLeSC-COMPAS/kmm

KMM: parallel dataflow scheduler and efficient memory management for multi-GPU platforms

Language: C++ - Size: 7.34 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 1

QianMo/GPU-Pro-Books-Source-Code

:cd: Source Code Collection of Book <GPU Pro> 1~ 7 | 《GPU Pro》1~ 7 书本源代码珍藏

Language: GLSL - Size: 2.73 GB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 680 - Forks: 348

pjyi2147/CUDA_HTN_Workshop

Introduction to Nvidia CUDA workshop repository @ Hack the North 2024

Language: Jupyter Notebook - Size: 8.47 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 2

Vincent-Therrien/gpu-arena

Compare and test GPU programming frameworks

Language: C++ - Size: 3.52 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 109 - Forks: 8

kartavyaantani/CUDA_IMAGE_PROCESSING

A CUDA-accelerated image processing project featuring multiple GPU-based filters and enhancement techniques. Implements convolution, edge detection, Non-Local Means (NLM) denoising, K-Nearest Neighbors (KNN), and pixelization. Each operation is optimized using CUDA kernels for real-time performance on large images. The project supports command-line

Language: Jupyter Notebook - Size: 5.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

palapav/triton-compute-kernels

A collection of Triton compute kernels for common ML operations

Size: 3.91 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

yashkathe/Image-Noise-Reduction-with-CUDA

This project conducts an analysis of image denoising technique - median blur, comparing GPU-accelerated (Numba) and CPU-based (OpenCV) processing speeds.

Language: Jupyter Notebook - Size: 25.4 MB - Last synced at: 25 days ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

LLNL/CARE

CHAI and RAJA provide an excellent base on which to build portable codes. CARE expands that functionality, adding new features such as loop fusion capability and a portable interface for many numerical algorithms. It provides all the basics for anyone wanting to write portable code.

Language: C++ - Size: 1.47 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 30 - Forks: 4

aryagxr/cuda

100 Days of CUDA!!!

Language: Cuda - Size: 120 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 5 - Forks: 0

aditiisaxena/CUDA-Accelerated-Box-Filter-for-Texture-Image-Enhancement

Enhances grayscale texture images using a CUDA-based box filter. Built with CUDA, C++14, and OpenCV for high-performance image processing.

Language: Cuda - Size: 65.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

bfGraph/STGraph

🌟 Vertex Centric approach for building GNN/TGNNs

Language: Python - Size: 13.7 MB - Last synced at: 25 days ago - Pushed at: 7 months ago - Stars: 22 - Forks: 0

eedalong/ECE408

Code base and slides for ECE408：Applied Parallel Programming On GPU.

Language: C++ - Size: 35.6 MB - Last synced at: about 2 months ago - Pushed at: almost 4 years ago - Stars: 122 - Forks: 34

AlfonsoLRz/LiDAR_BRDF

Source code of "Enhancing LiDAR point cloud generation with BRDF-based appearance modelling" (yet to be published).

Size: 18.6 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

NielsOuvrard/metal-sand-box

Metal graphics experiments based on 'Metal by Tutorials'

Language: Swift - Size: 16.5 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

MolSSI-Education/gpu_programming_beginner

Fundamentals of heterogeneous parallel programming with CUDA C/C++ at the beginner level.

Language: Python - Size: 5.25 MB - Last synced at: 15 days ago - Pushed at: about 2 years ago - Stars: 11 - Forks: 2

Nicolas-Ferre/ragna

A Rust library for easily creating GPU-native applications

Language: Rust - Size: 197 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

hollance/metal-gpgpu

Collection of notes on how to use Apple’s Metal API for compute tasks

Size: 1000 Bytes - Last synced at: 2 months ago - Pushed at: almost 7 years ago - Stars: 103 - Forks: 4

tgautam03/tGeMM

General Matrix Multiplication using NVIDIA Tensor Cores

Language: Cuda - Size: 47.9 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 13 - Forks: 3

ysh329/OpenCL-101

Learn OpenCL step by step.

Language: C - Size: 476 KB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 135 - Forks: 29

tgautam03/xFilters

GPU (CUDA) accelerated filters using 2D convolution for high resolution images.

Language: C++ - Size: 58.2 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 6 - Forks: 1

ankhoa1212/cuda-program

This is a GPU program built with CUDA using parallel reduction

Language: C - Size: 13.8 MB - Last synced at: 21 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Awrsha/Advanced-CUDA-Programming-GPU-Architecture

This repository provides a comprehensive guide to optimizing GPU kernels for performance, with a focus on NVIDIA GPUs. It covers key tools and techniques such as CUDA, PyTorch, and Triton, aimed at improving computational efficiency for deep learning and scientific computing tasks.

Language: Cuda - Size: 25.2 MB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

mikeroyal/Vulkan-Guide

Vulkan Guide

Language: C++ - Size: 43 KB - Last synced at: 25 days ago - Pushed at: over 3 years ago - Stars: 28 - Forks: 2

ProjectPhysX/PTXprofiler

A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.

Language: C++ - Size: 11.7 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 50 - Forks: 6

QianMo/Game-Programmer-Study-Notes

:anchor: 我的游戏程序员生涯的读书笔记合辑。你可以把它看作一个加强版的Blog。涉及图形学、实时渲染、编程实践、GPU编程、设计模式、软件工程等内容。Keep Reading , Keep Writing , Keep Coding.

Size: 752 MB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 9,412 - Forks: 1,722

vista-art/fragmentcolor

🦀 Easy GPU programming for Javascript, Python, Swift, and Kotlin.

Language: Rust - Size: 49.2 MB - Last synced at: 11 days ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0

unisa-hpc/sycl-bench

SYCL Benchmark Suite

Language: C++ - Size: 24.7 MB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 64 - Forks: 35

michel-meneses/great-opencl-examples

Collection of easy, well-documented and useful OpenCL examples in C++.

Language: C++ - Size: 1000 KB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 75 - Forks: 27

itslokesh/Multi-Max-Clique

Multi-Max-Clique, an application that solves Maximum Clique Problem using the parallel branch and bound approach and achieved linear and super-linear speedups in CUDA.

Language: Cuda - Size: 829 KB - Last synced at: 4 days ago - Pushed at: almost 7 years ago - Stars: 4 - Forks: 0

Heteroflow/Heteroflow

Concurrent CPU-GPU Programming using Task Models

Language: C++ - Size: 1.58 MB - Last synced at: 2 months ago - Pushed at: over 5 years ago - Stars: 101 - Forks: 13

xframes-project/xframes

GPU-accelerated GUI development for the desktop and the browser

Language: TypeScript - Size: 28.4 MB - Last synced at: 11 days ago - Pushed at: 5 months ago - Stars: 13 - Forks: 0

jayeshthk/Parallel_Computing Fork of ShashankDavalgi/Parallel_Computing

CUDA computing example repo. with complex matrix multiplication.

Language: C - Size: 14.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Elsword016/100days_Triton

Learning triton and GPU acceleration from scratch

Language: Jupyter Notebook - Size: 1.53 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

coderonion/cuda-beginner-course-rust-version

bilibili视频【CUDA 12.x 并行编程入门(Rust版)】配套代码

Language: Rust - Size: 10.7 KB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 6 - Forks: 0

ParaGroup/WindFlow

A C++17 Data Stream Processing Parallel Library for Multicores and GPUs

Language: C++ - Size: 48.9 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 81 - Forks: 19

DannyDoesGraphics/DARE

Danny's Awesome Rendering Engine

Language: Rust - Size: 4.52 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

machineko/SwiftCU

SwiftCU is a wrapper for CUDA runtime API's (exposed as cxxCU) with extra utilities for device management, memory ops and kernel execution, along with a robust suite of tests. Repo is tested on newest (v12.5) CUDA runtime API on both Linux and Windows.

Language: Swift - Size: 613 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

gpufit/Gpufit

GPU-accelerated Levenberg-Marquardt curve fitting in CUDA

Language: Cuda - Size: 1.16 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 319 - Forks: 96

ShadyBoukhary/GPU-research-FFT-OpenACC-CUDA

Case studies constitute a modern interdisciplinary and valuable teaching practice which plays a critical and fundamental role in the development of new skills and the formation of new knowledge. This research studies the behavior and performance of two interdisciplinary and widely adopted scientific kernels, a Fast Fourier Transform and Matrix Multiplication. Both routines are implemented in the two current most popular many-core programming models CUDA and OpenACC. A Fast Fourier Transform (FFT) samples a signal over a period of time and divides it into its frequency components, computing the Discrete Fourier Transform (DFT) of a sequence. Unlike the traditional approach to computing a DFT, FFT algorithms reduce the complexity of the problem from O(n2) to O(nLog2n). Matrix multiplication is a cornerstone routine in Mathematics, Artificial Intelligence and Machine Learning. This research also shows that the nature of the problem plays a crucial role in determining what many-core model will provide the highest benefit in performance.

Language: Cuda - Size: 9.12 MB - Last synced at: about 2 months ago - Pushed at: almost 7 years ago - Stars: 13 - Forks: 3

anselm67/CUDA_mnist

A CUDA implementation of MNIST - for CUDA beginners.

Language: Cuda - Size: 19.5 KB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

AlexJMercer/CUDA-NPP-Assignment

Learning about CUDA and NVIDIA Performance Primitives. Part of Coursera Assignment.

Language: C++ - Size: 9.49 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

DanHouseman/AdaptiveSort

C# Extention methods for super efficient sorting using CPU, GPU, and FPGA

Language: C# - Size: 33.2 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

dj-himp/DX11GPUParticles

A fully gpu particle system with Directx 11

Language: C++ - Size: 240 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 9 - Forks: 1

andi611/Apriori-and-Eclat-Frequent-Itemset-Mining

Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.

Language: Python - Size: 4.05 MB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 48 - Forks: 19

YaccConstructor/Brahma.FSharp Fork of gsvgit/Brahma.FSharp

F# quotation to OpenCL translator and respective runtime to utilize GPGPUs in F# applications.

Language: F# - Size: 52.1 MB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 75 - Forks: 17

Shapur1234/Fractl

Fractal renderer written in rust supporting multithreading, gpu compute and wasm

Language: Rust - Size: 43.7 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

abeleinin/Metal-Puzzles

Solve Puzzles. Learn Metal 🤘

Language: Jupyter Notebook - Size: 3.84 MB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 505 - Forks: 22

Related Keywords

gpu-programming 375 cuda 184 gpu 120 gpu-computing 94 gpu-acceleration 65 cpp 60 cuda-programming 53 parallel-programming 51 parallel-computing 37 c 36 opencl 28 nvidia 25 python 23 openmp 22 gpgpu 21 nvidia-gpu 21 cuda-kernels 19 nvidia-cuda 18 machine-learning 17 hpc 17 rust 16 vulkan 16 graphics 15 high-performance-computing 14 deep-learning 14 opengl 14 multithreading 13 shaders 13 image-processing 13 graphics-programming 11 raytracing 11 metal 11 glsl 11 shader 10 rendering 10 c-plus-plus 8 parallel 8 triton 8 parallel-processing 8 tensorflow 7 matrix-multiplication 7 game-engine 7 sorting-algorithms 7 computer-graphics 7 mpi 7 algorithms 6 hip 6 openacc 6 swift 6 heterogeneous-parallel-programming 6 game-development 6 cuda-toolkit 6 sycl 6 webgpu 6 cplusplus 5 computer-vision 5 opencv 5 javascript 5 multicore-programming 5 nvcc 5 cuda-library 5 gpgpu-computing 5 raytracer 5 renderer 5 artificial-intelligence 5 neural-network 5 fluid-simulation 4 cublas 4 cnn 4 dpcpp 4 openmp-parallelization 4 compute-shader 4 numpy 4 vulkan-api 4 parallel-algorithm 4 pycuda 4 julia 4 compiler 4 neural-networks 4 profiling 4 webgl 4 pytorch 4 concurrent-programming 4 stream-processing 4 wasm 3 streams 3 oneapi 3 particles 3 path-tracer 3 wgsl 3 graphics-engine 3 lookup 3 data-science 3 numba 3 vulkan-compute-shaders 3 hpc-applications 3 big-data 3 bigdata 3 numerical-methods 3 library 3