GitHub topics: cuda-programming
shrutipangare/CUDAConvolution
ConvolutioninCUDA
Language: Cuda - Size: 7.89 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0
 
      MuGdxy/muda
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
Language: C++ - Size: 14.8 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 193 - Forks: 13
 
      razord21/Canny-Edge-Detector
🖼️ Implement high-performance Canny edge detection using CPU and CUDA, enabling efficient image processing with benchmarking capabilities.
Language: C - Size: 1.38 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0
 
      NVIDIA/cccl
CUDA Core Compute Libraries
Language: C++ - Size: 240 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1,994 - Forks: 284
 
      FaresArgus/artaxerxes
Adaptive high-performance stress tester "artaxerxes" supports GPU, io_uring, DPDK, and eBPF/XDP for advanced cybersecurity labs. Ideal for network testing. 🚀🛠️
Language: C - Size: 26.4 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0
 
      vmakarov28/Alpaca-Stock-Trading-Bot
This is an AI-powered stock trading bot that uses neural networks to predict market trends and execute trades via the Alpaca API. Built with PyTorch for GPU acceleration.
Language: Python - Size: 2.49 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 1
 
      Yash-1335/qwen600
🚀 Build a fast inference engine for the QWEN3-0.6B model using CUDA, optimizing performance with minimal dependencies for efficient learning and practice.
Language: Cuda - Size: 2.04 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0
 
      phbastosa/EikoStagTriX3D
GPU-Accelerated Seismic Wave Simulation in Generally Anisotropic Media Using Eikonal-Guided Domain Clipping and Compressed Stiffness Representation.
Language: Cuda - Size: 86.9 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0
 
      GianmariaRomano/PMC-Translated-Notes
The repository contains translated notes for the course "Programmazione di Sistemi Multicore" given by Professor De Sensi for the "Informatica" course at Sapienza Università di Roma.
Language: C - Size: 2.88 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0
 
      coreylowman/cudarc
Safe rust wrapper around CUDA toolkit
Language: Rust - Size: 3.7 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 950 - Forks: 118
 
      goabiaryan/awesome-gpu-engineering
GPU Engineering for AI Systems
Language: HTML - Size: 900 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 50 - Forks: 6
 
      iamfaham/cuda-kernel-inference-profiler
CUDA Kernel Inference Profiler is a lightweight, Colab-ready benchmarking tool that profiles transformer inference at the kernel, operator, and memory level.
Language: Jupyter Notebook - Size: 118 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0
 
      Accumulated/Accelerating-CNN-on-GPU-using-CUDA-C
This repository is for implementing and accelerating CNN on GPU using NVIDIA CUDA C. The current code has 8 msec execution time for inference. The CNN used is called Efficient Net.
Language: Jupyter Notebook - Size: 39.6 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 6 - Forks: 2
 
      HenryNdubuaku/cuda-tutorials
CUDA tutorials for Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.
Language: Cuda - Size: 428 KB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 195 - Forks: 6
 
      DiamondLightSource/fast-feedback-service
GPU based service to provide fast-feedback results
Language: C++ - Size: 1010 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3 - Forks: 3
 
      emptysoal/YOLOv5-TensorRT-lib-Python
The code of YOLOv5 inferencing with TensorRT C++ api is packaged into a dynamic link library , then called through Python.
Language: Cuda - Size: 753 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 15 - Forks: 1
 
      WenchaoHuang/Nucleus
C++ Bindings for CUDA Resources
Language: C++ - Size: 254 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 4 - Forks: 3
 
      taskflow/taskflow
A General-purpose Task-parallel Programming System using Modern C++
Language: C++ - Size: 142 MB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 11,321 - Forks: 1,323
 
      IsidoroGlez/micSA-EA-MC
CUDA implementation of a microcanonical Simulated Annealing Monte Carlo algorithm for the 3D Edwards–Anderson spin glass model. This repository contains the code, data, and figures associated with the research available at arXiv.
Language: C - Size: 521 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1 - Forks: 0
 
      Rust-GPU/rust-cuda
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
Language: Rust - Size: 6.06 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 4,780 - Forks: 209
 
      mikeroyal/CUDA-Guide
CUDA Guide
Language: Cuda - Size: 83 KB - Last synced at: 4 days ago - Pushed at: almost 2 years ago - Stars: 74 - Forks: 11
 
      phbastosa/EikoStagTriX2D
GPU-Accelerated Seismic Wave Simulation in Generally Anisotropic Media Using Eikonal-Guided Domain Clipping and Compressed Stiffness Representation.
Language: Cuda - Size: 84 KB - Last synced at: 13 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0
 
      RainerMtb/cuvista
Accelerated Optical Video Stabilizer, Cuda, OpenCL, Avx512
Language: C++ - Size: 45.4 MB - Last synced at: 13 days ago - Pushed at: 14 days ago - Stars: 14 - Forks: 1
 
      abhiyanpaudel/parallel-highlife
High-performance CUDA, MPI, and Hybrid implementations demonstrating GPU computing and parallel programming.
Language: C - Size: 438 KB - Last synced at: 14 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0
 
      dragunovdenis/DeepLearning
C++ framework for deep neural networks
Language: C++ - Size: 12.2 MB - Last synced at: 14 days ago - Pushed at: 16 days ago - Stars: 4 - Forks: 0
 
      Lin-Mao/DrGPUM
A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.
Language: Python - Size: 248 KB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 26 - Forks: 3
 
      ashvardanian/PyBindToGPUs
Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11
Language: Cuda - Size: 238 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 29 - Forks: 3
 
      romitjain/awesome-llm-systems
This repository aims to consolidate resources for learning about systems for LLM
Size: 2.93 KB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0
 
      yassa9/qwen600
Static suckless single batch CUDA-only qwen3-0.6B mini inference engine
Language: Cuda - Size: 793 KB - Last synced at: 26 days ago - Pushed at: about 2 months ago - Stars: 498 - Forks: 38
 
      toxy4ny/artaxerxes
Artaxerxes - Adaptive High-Performance Stress Tester v.1.0. Rebuild old version Xerxes DDoS. Supports GPU+io_uring, DPDK, eBPF/XDP with intelligent fallbacks. Educational tool for advanced cybersecurity labs
Language: C - Size: 27.3 KB - Last synced at: 23 days ago - Pushed at: 4 months ago - Stars: 26 - Forks: 1
 
      saeedahmadicp/Fundamentals-of-Accelerated-Computing-with-CUDA-Python
Fundamentals of Accelerated Computing with CUDA Python
Language: Jupyter Notebook - Size: 6.63 MB - Last synced at: 12 days ago - Pushed at: 7 months ago - Stars: 3 - Forks: 0
 
      LiteObject/CUDA-Image-Processing-App
Real-time GPU-accelerated image processing application using CUDA and Python. Features 11 visual filters including edge detection, blur, sepia, cartoon effects, and more - all running at 30 FPS with live webcam input.
Language: Python - Size: 62.5 KB - Last synced at: 26 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0
 
      Crostino14/RB-Tree-Search-Project-Linux-Version
"Red-Black Tree Search Project": Parallelized Red-Black Tree search with MPI, OpenMP, and CUDA. Performance analysis on various hardware and input configurations.
Language: C - Size: 5.91 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0
 
      Crostino14/RBTree-Search-Project
"Red-Black Tree Search Project": Parallelized Red-Black Tree search with MPI, OpenMP, and CUDA. Performance analysis on various hardware and input configurations.
Language: C - Size: 18 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
 
      mclane2/HPC_Projects
Computational Projects completed for M.Sc. in High Performance Computing at Trinity College Dublin
Language: Jupyter Notebook - Size: 18.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0
 
      Guillaume-Helbecque/GPU-accelerated-tree-search-Chapel
GPU-accelerated tree search: Investigating Chapel versus CUDA/HIP+X
Language: Chapel - Size: 591 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2 - Forks: 1
 
      RRZE-HPC/MD-Bench
A performance-oriented prototyping harness for state of the art Molecular Dynamics algorithms
Language: C - Size: 4.52 MB - Last synced at: 13 days ago - Pushed at: 14 days ago - Stars: 17 - Forks: 9
 
      coderonion/cuda-beginner-course-rust-version
bilibili视频【CUDA 12.x 并行编程入门(Rust版)】配套代码
Language: Rust - Size: 10.7 KB - Last synced at: 18 days ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 0
 
      Nihar-Shah2001/Parallel_Computing
This repository contains my comprehensive Parallel Computing Notes written in LaTeX. It serves as both a study reference and a practical resource for students, researchers, and professionals (especially from non-CS backgrounds) working in High Performance Computing (HPC), OpenMP, MPI, CUDA.
Language: TeX - Size: 39.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 0
 
      professorcode1/Event-Analysis
Library for Event Synchronization and Event Coincidence Analysis
Language: Jupyter Notebook - Size: 1.03 MB - Last synced at: 7 days ago - Pushed at: 10 months ago - Stars: 14 - Forks: 3
 
      emptysoal/cuda-image-preprocess
Speed up image preprocess with cuda when handle image or tensorrt inference
Language: Cuda - Size: 101 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 77 - Forks: 5
 
      Mgepahmge/CuWeaver
A CUDA concurrency library designed to simplify concurrency programming, offering C++-style wrappers for selected CUDA Runtime APIs
Language: Cuda - Size: 1.48 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 0
 
      Awrsha/Advanced-CUDA-Programming-GPU-Architecture
This repository provides a comprehensive guide to optimizing GPU kernels for performance, with a focus on NVIDIA GPUs. It covers key tools and techniques such as CUDA, PyTorch, and Triton, aimed at improving computational efficiency for deep learning and scientific computing tasks.
Language: Cuda - Size: 25.2 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 3 - Forks: 0
 
      tgautam03/xFilters
GPU (CUDA) accelerated filters using 2D convolution for high resolution images.
Language: C++ - Size: 58.2 MB - Last synced at: 21 days ago - Pushed at: 9 months ago - Stars: 8 - Forks: 1
 
      dino65-dev/Cuda_ML_Library
This is a Cuda applied ML Library so that anyone can use GPU Powered ML with Ease in Python.
Language: Cuda - Size: 143 KB - Last synced at: 22 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0
 
      cybersecurity-dev/awesome-gpu-programming
Awesome GPU Programming
Size: 11.7 KB - Last synced at: 12 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0
 
      harleyszhang/llm_note
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
Language: Python - Size: 185 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 817 - Forks: 88
 
      Orlando275/CUDA-high-performance-demos
A collection of CUDA programming exercises focused on exploring and implementing high-performance GPU computing techniques. The repository covers topics such as warp-level optimization, shared memory utilization, and various algorithm implementations tailored for parallel processing.
Language: Cuda - Size: 19.5 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0
 
      Koushikphy/Intro-to-CUDA-Fortran
A Complete beginner's introduction to programming with CUDA Fortran
Size: 200 KB - Last synced at: 3 days ago - Pushed at: about 3 years ago - Stars: 31 - Forks: 1
 
      real-space/AngstromCube
A parallel and GPU-accelerated Code for Real-Space All-Electron Linear-Scaling Density Functional Theory
Language: C++ - Size: 34 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 2
 
      ayushraina2028/DS295-Parallel-Programming-2025
This repository contains my latex notes for Parallel Programming and all my implementations using CUDA C/C++, Open MP and MPI
Language: Jupyter Notebook - Size: 45.2 MB - Last synced at: 5 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0
 
      GeoffreyWang1117/Mandelbrot-Renderer
High-performance Mandelbrot fractal renderer supporting deep zoom, multi-threaded computation, and customizable color palettes. Built for educational and exploratory purposes with support for CPU/GPU acceleration.
Language: C++ - Size: 174 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0
 
      slbouknight/accelerated-ray-tracer
A simple ray tracer accelerated with CUDA
Language: Cuda - Size: 11.1 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0
 
      Demon-Sheriff/tiny-flash-attention
custom flash attention kernel in cuda to benchmark it against torch and burn my rtx 3050
Language: Cuda - Size: 8.79 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0
 
      maurosuetta/Distributed-Parallel-Programming-Projects
This repository contains multiple projects which involve distributed and parallel programming. We encompass OpenMP, MPI, CUDA and Open ACC. It has been used a private cluster (Pirineus3) to execute the programs. You'll find explanations on the reports.
Language: C - Size: 20 MB - Last synced at: 29 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0
 
      KarhouTam/cuda-kernels
Some common CUDA kernel implementations (Not the fastest).
Language: Cuda - Size: 60.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 21 - Forks: 1
 
      Krasnomakov/EventDrivenArchitecture
Prototypes of Event-Driven Architecture with Computer Vision, games, aniamtion and LLM models
Language: Python - Size: 110 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0
 
      arvinsingh/gpu-benchmark-suite
A comprehensive CLI tool for benchmarking GPU performance across CUDA, Triton, and PyTorch implementations.
Language: Python - Size: 132 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0
 
      emptysoal/TensorRT-v8-YOLOv5-v5.0
Based on TensorRT v8.2, build network for YOLOv5-v5.0 by myself, speed up YOLOv5-v5.0 inferencing
Language: C++ - Size: 434 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 12 - Forks: 1
 
      l3lackcurtains/dbscan-kdtree-cuda
:fries: Massively parallel DBSCAN algorithm implemented in CUDA along with a KD-Tree for searching neighbors.
Language: Cuda - Size: 16.1 MB - Last synced at: 2 months ago - Pushed at: about 5 years ago - Stars: 13 - Forks: 4
 
      ShowayLiao/LiMR_cpp
Real-time Industrial Anomaly Defect Inference Detection implemented by cpp(实时工业缺陷检测cpp)
Language: C++ - Size: 500 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0
 
      dontdothatjoel/CUDA-GEMM-kernel
My attempt of making a GEMM kernel...
Language: Cuda - Size: 73.2 KB - Last synced at: 18 days ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0
 
      Masoudjafaripour/Transformer-CUDA Fork of saimeghana-y/Transformer-CUDA
Building upon original repo, trying to implement encoder-decoder transformer using CUDA
Language: Python - Size: 22.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0
 
      Kevin22888/AkuaEngine
A real-time fluid simulation engine implemented in C++, with CUDA and OpenGL.
Language: C++ - Size: 24.7 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0
 
      trieck/pixienn
A modern C++ reimplementation of Darknet with CUDA support for efficient neural network inference
Language: C++ - Size: 4.61 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0
 
      eyalroz/cuda-api-wrappers
Thin, unified, C++-flavored wrappers for the CUDA APIs
Language: C++ - Size: 2.86 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 853 - Forks: 86
 
      gwdina/GPU-Accelerated-Matrix-Multiplication
This project demonstrates matrix multiplication accelerated on a GPU using NVIDIA's CUDA programming model. It is designed to compute the product of two large matrices in parallel, taking advantage of the GPU’s massive threading capabilities to significantly outperform traditional CPU-based matrix multiplication.
Language: Cuda - Size: 204 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0
 
      MolSSI-Education/gpu_programming_beginner
Fundamentals of heterogeneous parallel programming with CUDA C/C++ at the beginner level.
Language: Python - Size: 5.25 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 13 - Forks: 2
 
      hetan-official/CUDA_C_Best_Practices_Guide-In-Chinese
This is a Chinese translation of the CUDA_C_Best_Practices_Guide
Size: 39.1 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0
 
      harshrajhrj/cuda-programming
This repository consists of CUDA programming (specifically for Deep Learning) in C++ and Python. Links: https://github.com/Infatoshi/mnist-cuda
Language: Cuda - Size: 2.92 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0
 
      phbastosa/SeisFAT3D
Modeling, inversion and migration focusing on seismic first-arrivals.
Language: Cuda - Size: 236 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 10 - Forks: 2
 
      artarchi/TaskFlow
TaskFlow is a MERN stack Todo application that enables users to manage their tasks efficiently. With features like JWT authentication and a responsive UI, it provides a seamless experience for both desktop and mobile users. 🐙🌐
Language: CSS - Size: 3.33 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0
 
      sail-sg/Adan
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Language: Python - Size: 1.31 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 795 - Forks: 69
 
      berserk-23115/GPU-Specialisation-IP
Independent Project Submission for GPU programming specialisation : Anushk Kumar
Language: Cuda - Size: 16.6 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0
 
      berserk-23115/GPU-Specialisation-Capstone
GPU Programming Specialisation Capstone Project submission by Anushk Kumar
Language: Cuda - Size: 20.5 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0
 
      ZaidMohsin457/Parallelizing-GNN
This project demonstrates parallelization techniques for Graph Neural Networks (GNNs) using: CUDA for GPU acceleration MPI (mpi4py) for distributed computing Python Multiprocessing for parallel processing The implementation uses the PubMed dataset from PyTorch Geometric and a 2-layer GCN model.
Language: Python - Size: 448 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 1
 
      PaddleJitLab/CUDATutorial
A self-learning tutorail for CUDA High Performance Programing.
Language: JavaScript - Size: 108 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 658 - Forks: 69
 
      MuhammadMuazen/Simple-Matrices-Multiplication-Using-Cuda
Just a simple matrices multiplication using cuda
Language: Cuda - Size: 8.63 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0
 
      branebb/nn-framework
Framework for creating neural networks using C++ and CUDA platform. This project is part of my final university assignment for bachelor's degree.
Language: Cuda - Size: 64.5 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0
 
      jrajan14/CUDA_Programs
Nvidia CUDA Programs. High-performance computing with my collection of CUDA programs, meticulously crafted to harness the immense power of NVIDIA's GPU architecture. From blazingly fast simulations to data-intensive parallel processing, these programs showcase my passion for pushing the boundaries of performance optimization.
Language: Cuda - Size: 30.8 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 5 - Forks: 2
 
      Cat-Gawr/AI-Python
Una piccola AI che il suo picco massimo di risposta è stato di 0.02 secondi di risposta | Konata ~ 2025
Language: Python - Size: 898 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0
 
      jaredhoberock/ubu
Language: C++ - Size: 1.97 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0
 
      AlexJMercer/Fractal-Art
Generating Fractals in C++ using SFML. For the ultimate visual stimulation and in-depth code!
Language: C++ - Size: 4.84 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0
 
      govindansriram/sm89-kernels
SM89 Optimized CUDA Kernels
Language: Cuda - Size: 75.2 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0
 
      Kaminyou/Flash-Attention-Practice
An minimal CUDA implementation of FlashAttention v1 and v2
Language: Python - Size: 19.5 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0
 
      Mazharuddin-Mohammed/QDSim
High-performance 2D Quantum Dot (QD) Simulator implemented in C++ and Python
Language: C++ - Size: 1.26 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0
 
      dpetrosy/Fractal
This project is a Fractal Visualizer developed in C++ with SFML and CUDA.
Language: C++ - Size: 4.86 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0
 
      Fantasya63/DistributedRayTracer
A small path tracer that runs in the gpu with the use of numba cuda in python.
Language: Python - Size: 26.6 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0
 
      florist-notes/aicore_s
AI, IoT and Robotics Hardware + ROS
Language: Jupyter Notebook - Size: 361 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 9 - Forks: 1
 
      LuongHuuPhuc/Project_2024-2
Parallel programming for Merge sort algorithm using OpenMP and CUDA
Language: Cuda - Size: 3.7 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0
 
      marcoplaitano/counting-sort-cuda
Parallelized version of Counting Sort using CUDA
Language: C - Size: 26.4 KB - Last synced at: 4 months ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 0
 
      MatteoFasulo/Multi-layer-Neural-Network
A Parallel implementation for a particular kind of multi-layer Neural Network
Language: Cuda - Size: 3.76 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0
 
      guoriyue/warp-from-device
Language: Cuda - Size: 1.71 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1
 
      jerry060599/KittenGpuLBVH
A high performance and friendly GPU LBVH implementation.
Language: Cuda - Size: 90.8 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 24 - Forks: 4
 
      mit-han-lab/TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
Language: C++ - Size: 83.3 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 852 - Forks: 85
 
      Momijiichigo/particle_field_simulation
Simulating the particle fields by using the time-evolution equations extracted from Euler-Lagrange equations of fields
Language: Jupyter Notebook - Size: 523 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0
 
      coderonion/cuda-beginner-course-cpp-version
bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码
Language: Cuda - Size: 20.5 KB - Last synced at: 18 days ago - Pushed at: about 1 year ago - Stars: 29 - Forks: 5
 
      matrix97317/OneTensor
This is a simple and easy-to-use Tensor Library.
Language: Cuda - Size: 2.03 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 2
 
      loukmane-lok/HPC-Quiz-Bank
A collection of multiple choice questions (MCQs) on High Performance Computing (HPC) and Lab solutions
Size: 15.6 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0
 
      AdroitAnandAI/Parallel-RNG-using-GPU
Parallel implementation of inherently sequential algorithms using mathematical hacks. Random Number Generators - Additive LFG and GFSR - implemented with NVIDIA CUDA using Continuous Subsequence Technique and Leap Frog Technique
Language: Cuda - Size: 3.27 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0
