cuda-programming | Topic | Ecosyste.ms: Repos

Topic: "cuda-programming"

taskflow/taskflow

A General-purpose Task-parallel Programming System using Modern C++

Language: C++ - Size: 138 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 11,057 - Forks: 1,294

Rust-GPU/Rust-CUDA

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.

Language: Rust - Size: 6.08 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 4,527 - Forks: 190

NVIDIA/cccl

CUDA Core Compute Libraries

Language: C++ - Size: 84.8 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,797 - Forks: 241

brucefan1983/CUDA-Programming

Sample codes for my CUDA programming book

Language: Cuda - Size: 9.13 MB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 1,712 - Forks: 347

coreylowman/cudarc

Safe rust wrapper around CUDA toolkit

Language: Rust - Size: 2.91 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 862 - Forks: 106

mit-han-lab/TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

Language: C++ - Size: 83.3 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 852 - Forks: 85

eyalroz/cuda-api-wrappers

Thin, unified, C++-flavored wrappers for the CUDA APIs

Language: C++ - Size: 2.85 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 851 - Forks: 85

harleyszhang/llm_note

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

Language: Python - Size: 184 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 800 - Forks: 87

sail-sg/Adan

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

Language: Python - Size: 1.31 MB - Last synced at: 20 days ago - Pushed at: about 2 months ago - Stars: 795 - Forks: 69

PaddleJitLab/CUDATutorial

A self-learning tutorail for CUDA High Performance Programing.

Language: JavaScript - Size: 108 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 658 - Forks: 69

nosferalatu/SimpleGPUHashTable

A simple GPU hash table implemented in CUDA using lock free techniques

Language: Cuda - Size: 297 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 394 - Forks: 41

HenryNdubuaku/cuda-tutorials

CUDA tutorials for Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.

Language: Cuda - Size: 428 KB - Last synced at: 8 days ago - Pushed at: about 2 months ago - Stars: 187 - Forks: 5

MuGdxy/muda

μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.

Language: C++ - Size: 14.7 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 179 - Forks: 9

jaredhoberock/stanford-cs193g-sp2010

This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010

Language: C++ - Size: 127 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 170 - Forks: 73

SunsetQuest/CudaPAD

CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.

Language: C# - Size: 1.18 MB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 119 - Forks: 16

tgautam03/xGeMM

Accelerated General (FP32) Matrix Multiplication from scratch in CUDA

Language: Cuda - Size: 5.8 MB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 115 - Forks: 7

ROCm/HIP-CPU

An implementation of HIP that works on CPUs, across OSes.

Language: C++ - Size: 776 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 115 - Forks: 18

eyalroz/cuda-kat

CUDA kernel author's tools

Language: Cuda - Size: 1.57 MB - Last synced at: 9 months ago - Pushed at: over 3 years ago - Stars: 107 - Forks: 8

mikeroyal/CUDA-Guide

CUDA Guide

Language: Cuda - Size: 83 KB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 70 - Forks: 9

emptysoal/cuda-image-preprocess

Speed up image preprocess with cuda when handle image or tensorrt inference

Language: Cuda - Size: 91.8 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 63 - Forks: 5

FahimFBA/CUDA-WSL2-Ubuntu

Install CUDA on Windows11 using WSL2

Language: Jupyter Notebook - Size: 10.4 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 62 - Forks: 4

HuangCongQing/cuda-learning

cuda编程学习入门

Language: Cuda - Size: 5.66 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 35 - Forks: 6

LinhanDai/yolov9-tensorrt

YOLOv9 Tensorrt deployment acceleration，provide two implementation methods: C++and Python🔥🔥🔥

Language: C++ - Size: 1.07 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 32 - Forks: 6

coderonion/cuda-beginner-course-cpp-version

bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码

Language: Cuda - Size: 20.5 KB - Last synced at: about 15 hours ago - Pushed at: 12 months ago - Stars: 29 - Forks: 5

ashvardanian/cuda-python-starter-kit

Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11

Language: Cuda - Size: 238 KB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 26 - Forks: 3

Koushikphy/Intro-to-CUDA-Fortran

A Complete beginner's introduction to programming with CUDA Fortran

Size: 200 KB - Last synced at: 4 months ago - Pushed at: almost 3 years ago - Stars: 26 - Forks: 1

Lin-Mao/DrGPUM

A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.

Language: Python - Size: 248 KB - Last synced at: 5 days ago - Pushed at: 10 months ago - Stars: 25 - Forks: 3

jerry060599/KittenGpuLBVH

A high performance and friendly GPU LBVH implementation.

Language: Cuda - Size: 90.8 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 24 - Forks: 4

xmba15/ransac_lines_fitting_gpu

simple GPU ransac fitting of multiple lines on 2d/3d point cloud

Language: C++ - Size: 50.8 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 23 - Forks: 7

fjramireg/StiffMa

StiffMa: Fast finite element STIFFness MAtrix generation in MATLAB by using GPU computing.

Language: MATLAB - Size: 68.4 MB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 19 - Forks: 5

YichengDWu/FlashAttention.jl

Julia implementation of the Flash Attention algorithm

Language: Julia - Size: 898 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 18 - Forks: 1

AhmetFurkanDEMIR/NVIDIA-GPU-benchmark

NVIDIA GPU benchmark

Language: Jupyter Notebook - Size: 49.8 KB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 18 - Forks: 2

KarhouTam/cuda-kernels

Some common CUDA kernel implementations (Not the fastest).

Language: Cuda - Size: 57.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 17 - Forks: 1

RRZE-HPC/MD-Bench

A performance-oriented prototyping harness for state of the art Molecular Dynamics algorithms

Language: C - Size: 4.8 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 15 - Forks: 8

emptysoal/YOLOv5-TensorRT-lib-Python

The code of YOLOv5 inferencing with TensorRT C++ api is packaged into a dynamic link library , then called through Python.

Language: Cuda - Size: 750 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 15 - Forks: 1

imsanjoykb/CUDA-Bootcamp

CUDA Programming Practices

Language: Cuda - Size: 6.14 MB - Last synced at: 22 days ago - Pushed at: over 3 years ago - Stars: 15 - Forks: 3

littlebearsama/xxCu3Dlibrary

cuda 加速3D点云算法库，持续更新（含cudaicp，glfw点云可视化等）

Language: C - Size: 19.2 MB - Last synced at: 4 months ago - Pushed at: almost 3 years ago - Stars: 14 - Forks: 0

iamrohitsuthar/LP1

SPPU BE COMP Codes of LP1 - HPC, AIR, and DA

Language: Jupyter Notebook - Size: 6.22 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 14 - Forks: 10

tgautam03/tGeMM

General Matrix Multiplication using NVIDIA Tensor Cores

Language: Cuda - Size: 47.9 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 13 - Forks: 3

karthikeyann/cuda-calculator Fork of szho42/cuda-calculator

HTML/JS port of CUDA Occupancy Calculator

Language: CoffeeScript - Size: 170 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 13 - Forks: 7

emptysoal/TensorRT-v8-YOLOv5-v5.0

Based on TensorRT v8.2, build network for YOLOv5-v5.0 by myself, speed up YOLOv5-v5.0 inferencing

Language: C++ - Size: 431 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 12 - Forks: 1

guomc9/CudaRayTracing

A simple ray-tracing program implemented with CUDA.

Language: C++ - Size: 120 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 12 - Forks: 1

l3lackcurtains/dbscan-kdtree-cuda

:fries: Massively parallel DBSCAN algorithm implemented in CUDA along with a KD-Tree for searching neighbors.

Language: Cuda - Size: 16.1 MB - Last synced at: 2 months ago - Pushed at: almost 5 years ago - Stars: 12 - Forks: 4

l3lackcurtains/dbscan-cuda

:pizza: Massively parallel DBSCAN algorithm implemented in CUDA.

Language: Cuda - Size: 22 MB - Last synced at: 2 months ago - Pushed at: about 5 years ago - Stars: 12 - Forks: 2

RainerMtb/cuvista

Accelerated Optical Video Stabilizer, Cuda, OpenCL, Avx512

Language: C++ - Size: 43.1 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 11 - Forks: 1

minnukota381/cuda-parallel-c-programming

This repository contains various CUDA C programs demonstrating parallel computing techniques using NVIDIA's CUDA platform.

Language: Cuda - Size: 19.5 KB - Last synced at: 28 days ago - Pushed at: 11 months ago - Stars: 11 - Forks: 1

Chen-Si-An/Mesh-Reconstruction

Reconstruct mesh from point cloud data generated by 3D scanner

Language: C++ - Size: 61.8 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 11 - Forks: 0

MolSSI-Education/gpu_programming_beginner

Fundamentals of heterogeneous parallel programming with CUDA C/C++ at the beginner level.

Language: Python - Size: 5.25 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 11 - Forks: 2

phbastosa/SeisFAT3D

Modeling, inversion and migration focusing on seismic first-arrivals.

Language: Cuda - Size: 236 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 10 - Forks: 2

flin3500/Cuda-Google-Colab

The cuda code is mainly for nvidia hardware device. This repo will show how to run cuda c or cuda cpp code on the google colab platform for free.

Language: Jupyter Notebook - Size: 31.3 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 10 - Forks: 2

GithubRealFan/keccak256-blockchain-hash-opencl-kernel

Language: C - Size: 2.93 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 1

mhezarei/CUDA-RGB-grey

Converts an RGB image to greyscale using parallel programming.

Language: C++ - Size: 230 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 10 - Forks: 1

florist-notes/aicore_s

AI, IoT and Robotics Hardware + ROS

Language: Jupyter Notebook - Size: 361 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 1

This repository features a from-scratch implementation of a neural network using CUDA and C. The primary goal of this project is to leverage CUDA's parallel computing capabilities to significantly accelerate the training and inference processes of neural networks, utilizing the computational power of NVIDIA GPUs.

Language: Cuda - Size: 61.3 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 9 - Forks: 0

nssharmaofficial/kmeans-in-cuda

K-Means algorithm parallelized in CUDA

Language: Cuda - Size: 23.3 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 9 - Forks: 0

professorcode1/Event-Analysis

Library for Event Synchronization and Event Coincidence Analysis

Language: Jupyter Notebook - Size: 1020 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

GithubRealFan/Simple-Projects-CUDA

Language: Cuda - Size: 73.2 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 0

PanosAntoniadis/cuda-exercises-ntua

Lab exercise of Parallel Processing course in NTUA regarding CUDA programming

Language: Cuda - Size: 2.84 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 9 - Forks: 0

DmitryAsdre/rocauc_pairwise

RocAuc Pairiwse objective for gradient boosting

Language: Python - Size: 1.77 MB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 8 - Forks: 1

GithubRealFan/Matrix-Multiply-CUDA

Language: Cuda - Size: 21.5 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

neoblizz/HIP_template

🖤 Template for starting HIP/C++ project using CMake with Github Action for CI.

Language: CMake - Size: 26.4 KB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

real-space/AngstromCube

A parallel and GPU-accelerated Code for Real-Space All-Electron Linear-Scaling Density Functional Theory

Language: C++ - Size: 33.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 2

artuppp/EllipseFitCUDA

Ellipse Fit Implementation in CUDA

Language: Cuda - Size: 41 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 7 - Forks: 0

maya-undefined/gpu-desktop-calculator

Language: Cuda - Size: 48.8 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 7 - Forks: 0

mixed-farming/CSE-lab-solutions

Comprehensive CSE Lab Solutions repo; encompassing all my lab manuals, codes, documents, and endsem questions from my B.Tech program (2020-2024).

Language: C - Size: 253 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

priteshgohil/CUDA-programming-tutorial

Get started with CUDA programming

Language: Cuda - Size: 3.63 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 3

dmikushin/bilinear

A simple image filter example for those who study GPU/CUDA programming

Language: C++ - Size: 347 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 1

seieric/gst-dsobjectsmosaic

📀NVIDIA DeepStream integrated GStreamer Plugin. It can blur objects with cuda cores on Jetson boards. Fast and smooth since everything is done on NVMM.🏎

Language: C++ - Size: 143 KB - Last synced at: 12 days ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 2

m15kh/Cuda_Programming

CUDA programming enables parallel computing on NVIDIA GPUs for high-performance tasks like deep learning and scientific computing

Language: Cuda - Size: 790 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 6 - Forks: 0

tgautam03/xFilters

GPU (CUDA) accelerated filters using 2D convolution for high resolution images.

Language: C++ - Size: 58.2 MB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 6 - Forks: 1

coderonion/cuda-beginner-course-rust-version

bilibili视频【CUDA 12.x 并行编程入门(Rust版)】配套代码

Language: Rust - Size: 10.7 KB - Last synced at: about 15 hours ago - Pushed at: 12 months ago - Stars: 6 - Forks: 0

artmortal93/PatchMatchStereo_CUDA

PatchMatch Stereo with Red-Black modifiaction and Row Parallel modification for massively parallel computing

Language: C - Size: 113 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 1

jrajan14/CUDA_Programs

Nvidia CUDA Programs. High-performance computing with my collection of CUDA programs, meticulously crafted to harness the immense power of NVIDIA's GPU architecture. From blazingly fast simulations to data-intensive parallel processing, these programs showcase my passion for pushing the boundaries of performance optimization.

Language: Cuda - Size: 30.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 2