Topic: "cuda-programming"
taskflow/taskflow
A General-purpose Task-parallel Programming System using Modern C++
Language: C++ - Size: 138 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 10,911 - Forks: 1,275

Rust-GPU/Rust-CUDA
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
Language: Rust - Size: 6 MB - Last synced at: 10 days ago - Pushed at: 12 days ago - Stars: 4,409 - Forks: 183

brucefan1983/CUDA-Programming
Sample codes for my CUDA programming book
Language: Cuda - Size: 9.13 MB - Last synced at: 24 days ago - Pushed at: 4 months ago - Stars: 1,712 - Forks: 347

NVIDIA/cccl
CUDA Core Compute Libraries
Language: C++ - Size: 81.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,667 - Forks: 220

mit-han-lab/TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
Language: C++ - Size: 83.3 MB - Last synced at: 8 days ago - Pushed at: 11 months ago - Stars: 852 - Forks: 85

eyalroz/cuda-api-wrappers
Thin, unified, C++-flavored wrappers for the CUDA APIs
Language: C++ - Size: 2.87 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 839 - Forks: 83

coreylowman/cudarc
Safe rust wrapper around CUDA toolkit
Language: Rust - Size: 2.91 MB - Last synced at: 17 days ago - Pushed at: about 1 month ago - Stars: 838 - Forks: 101

sail-sg/Adan
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Language: Python - Size: 1.3 MB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 784 - Forks: 67

harleyszhang/llm_note
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
Language: Python - Size: 177 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 776 - Forks: 80

PaddleJitLab/CUDATutorial
A self-learning tutorail for CUDA High Performance Programing.
Language: JavaScript - Size: 108 MB - Last synced at: 24 days ago - Pushed at: about 2 months ago - Stars: 628 - Forks: 69

nosferalatu/SimpleGPUHashTable
A simple GPU hash table implemented in CUDA using lock free techniques
Language: Cuda - Size: 297 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 394 - Forks: 41

HMUNACHI/cuda-tutorials
CUDA tutorials for Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.
Language: Cuda - Size: 423 KB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 183 - Forks: 5

MuGdxy/muda
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
Language: C++ - Size: 14.7 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 174 - Forks: 8

jaredhoberock/stanford-cs193g-sp2010
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
Language: C++ - Size: 127 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 170 - Forks: 73

SunsetQuest/CudaPAD
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
Language: C# - Size: 1.18 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 117 - Forks: 16

tgautam03/xGeMM
Accelerated General (FP32) Matrix Multiplication from scratch in CUDA
Language: Cuda - Size: 5.8 MB - Last synced at: 12 days ago - Pushed at: 5 months ago - Stars: 115 - Forks: 7

ROCm/HIP-CPU
An implementation of HIP that works on CPUs, across OSes.
Language: C++ - Size: 776 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 115 - Forks: 18

eyalroz/cuda-kat
CUDA kernel author's tools
Language: Cuda - Size: 1.57 MB - Last synced at: 7 months ago - Pushed at: about 3 years ago - Stars: 107 - Forks: 8

mikeroyal/CUDA-Guide
CUDA Guide
Language: Cuda - Size: 83 KB - Last synced at: 25 days ago - Pushed at: over 1 year ago - Stars: 64 - Forks: 7

emptysoal/cuda-image-preprocess
Speed up image preprocess with cuda when handle image or tensorrt inference
Language: Cuda - Size: 91.8 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 63 - Forks: 5

FahimFBA/CUDA-WSL2-Ubuntu
Install CUDA on Windows11 using WSL2
Language: Jupyter Notebook - Size: 10.4 MB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 62 - Forks: 4

HuangCongQing/cuda-learning
cuda编程学习入门
Language: Cuda - Size: 5.66 MB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 35 - Forks: 6

LinhanDai/yolov9-tensorrt
YOLOv9 Tensorrt deployment acceleration,provide two implementation methods: C++and Python🔥🔥🔥
Language: C++ - Size: 1.07 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 32 - Forks: 6

coderonion/cuda-beginner-course-cpp-version
bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码
Language: Cuda - Size: 20.5 KB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 29 - Forks: 5

ashvardanian/cuda-python-starter-kit
Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11
Language: Cuda - Size: 238 KB - Last synced at: 2 days ago - Pushed at: 3 months ago - Stars: 26 - Forks: 3

Koushikphy/Intro-to-CUDA-Fortran
A Complete beginner's introduction to programming with CUDA Fortran
Size: 200 KB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 26 - Forks: 1

jerry060599/KittenGpuLBVH
A high performance and friendly GPU LBVH implementation.
Language: Cuda - Size: 90.8 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 24 - Forks: 4

xmba15/ransac_lines_fitting_gpu
simple GPU ransac fitting of multiple lines on 2d/3d point cloud
Language: C++ - Size: 50.8 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 23 - Forks: 7

Lin-Mao/DrGPUM
A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.
Language: Python - Size: 248 KB - Last synced at: 6 months ago - Pushed at: 8 months ago - Stars: 22 - Forks: 2

fjramireg/StiffMa
StiffMa: Fast finite element STIFFness MAtrix generation in MATLAB by using GPU computing.
Language: MATLAB - Size: 68.4 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 19 - Forks: 5

YichengDWu/FlashAttention.jl
Julia implementation of the Flash Attention algorithm
Language: Julia - Size: 898 KB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 18 - Forks: 1

AhmetFurkanDEMIR/NVIDIA-GPU-benchmark
NVIDIA GPU benchmark
Language: Jupyter Notebook - Size: 49.8 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 18 - Forks: 2

KarhouTam/cuda-kernels
Some common CUDA kernel implementations (Not the fastest).
Language: Cuda - Size: 57.6 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 17 - Forks: 1

RRZE-HPC/MD-Bench
A performance-oriented prototyping harness for state of the art Molecular Dynamics algorithms
Language: C - Size: 4.56 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 15 - Forks: 8

emptysoal/YOLOv5-TensorRT-lib-Python
The code of YOLOv5 inferencing with TensorRT C++ api is packaged into a dynamic link library , then called through Python.
Language: Cuda - Size: 749 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 15 - Forks: 1

imsanjoykb/CUDA-Bootcamp
CUDA Programming Practices
Language: Cuda - Size: 6.14 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 15 - Forks: 3

littlebearsama/xxCu3Dlibrary
cuda 加速3D点云算法库,持续更新(含cudaicp,glfw点云可视化等)
Language: C - Size: 19.2 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 14 - Forks: 0

iamrohitsuthar/LP1
SPPU BE COMP Codes of LP1 - HPC, AIR, and DA
Language: Jupyter Notebook - Size: 6.22 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 14 - Forks: 10

tgautam03/tGeMM
General Matrix Multiplication using NVIDIA Tensor Cores
Language: Cuda - Size: 47.9 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 13 - Forks: 3

karthikeyann/cuda-calculator Fork of szho42/cuda-calculator
HTML/JS port of CUDA Occupancy Calculator
Language: CoffeeScript - Size: 170 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 13 - Forks: 7

emptysoal/TensorRT-v8-YOLOv5-v5.0
Based on TensorRT v8.2, build network for YOLOv5-v5.0 by myself, speed up YOLOv5-v5.0 inferencing
Language: C++ - Size: 431 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 12 - Forks: 1

guomc9/CudaRayTracing
A simple ray-tracing program implemented with CUDA.
Language: C++ - Size: 120 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 12 - Forks: 1

l3lackcurtains/dbscan-kdtree-cuda
:fries: Massively parallel DBSCAN algorithm implemented in CUDA along with a KD-Tree for searching neighbors.
Language: Cuda - Size: 16.1 MB - Last synced at: 21 days ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 4

l3lackcurtains/dbscan-cuda
:pizza: Massively parallel DBSCAN algorithm implemented in CUDA.
Language: Cuda - Size: 22 MB - Last synced at: 21 days ago - Pushed at: almost 5 years ago - Stars: 12 - Forks: 2

minnukota381/cuda-parallel-c-programming
This repository contains various CUDA C programs demonstrating parallel computing techniques using NVIDIA's CUDA platform.
Language: Cuda - Size: 19.5 KB - Last synced at: 6 days ago - Pushed at: 9 months ago - Stars: 11 - Forks: 1

Chen-Si-An/Mesh-Reconstruction
Reconstruct mesh from point cloud data generated by 3D scanner
Language: C++ - Size: 61.8 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 11 - Forks: 0

MolSSI-Education/gpu_programming_beginner
Fundamentals of heterogeneous parallel programming with CUDA C/C++ at the beginner level.
Language: Python - Size: 5.25 MB - Last synced at: 14 days ago - Pushed at: about 2 years ago - Stars: 11 - Forks: 2

flin3500/Cuda-Google-Colab
The cuda code is mainly for nvidia hardware device. This repo will show how to run cuda c or cuda cpp code on the google colab platform for free.
Language: Jupyter Notebook - Size: 31.3 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 10 - Forks: 2

GithubRealFan/keccak256-blockchain-hash-opencl-kernel
Language: C - Size: 2.93 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 1

mhezarei/CUDA-RGB-grey
Converts an RGB image to greyscale using parallel programming.
Language: C++ - Size: 230 KB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 10 - Forks: 1

florist-notes/aicore_s
AI, IoT and Robotics Hardware + ROS
Language: Jupyter Notebook - Size: 361 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 9 - Forks: 1

RainerMtb/cuvista
Accelerated Optical Video Stabilizer, Cuda, OpenCL, Avx512
Language: C++ - Size: 45.1 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 9 - Forks: 1

phbastosa/SeisFAT3D
Modeling, inversion and migration focusing on seismic first-arrivals.
Language: Cuda - Size: 237 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 9 - Forks: 1

TheUnsolvedDev/CUDA_NN_FS
This repository features a from-scratch implementation of a neural network using CUDA and C. The primary goal of this project is to leverage CUDA's parallel computing capabilities to significantly accelerate the training and inference processes of neural networks, utilizing the computational power of NVIDIA GPUs.
Language: Cuda - Size: 61.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 9 - Forks: 0

nssharmaofficial/kmeans-in-cuda
K-Means algorithm parallelized in CUDA
Language: Cuda - Size: 23.3 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 9 - Forks: 0

professorcode1/Event-Analysis
Library for Event Synchronization and Event Coincidence Analysis
Language: Jupyter Notebook - Size: 1020 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

GithubRealFan/Simple-Projects-CUDA
Language: Cuda - Size: 73.2 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 0

PanosAntoniadis/cuda-exercises-ntua
Lab exercise of Parallel Processing course in NTUA regarding CUDA programming
Language: Cuda - Size: 2.84 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 9 - Forks: 0

DmitryAsdre/rocauc_pairwise
RocAuc Pairiwse objective for gradient boosting
Language: Python - Size: 1.77 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 8 - Forks: 1

GithubRealFan/Matrix-Multiply-CUDA
Language: Cuda - Size: 21.5 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 8 - Forks: 1

neoblizz/HIP_template
🖤 Template for starting HIP/C++ project using CMake with Github Action for CI.
Language: CMake - Size: 26.4 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

artuppp/EllipseFitCUDA
Ellipse Fit Implementation in CUDA
Language: Cuda - Size: 41 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 7 - Forks: 0

maya-undefined/gpu-desktop-calculator
Language: Cuda - Size: 48.8 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 0

mixed-farming/CSE-lab-solutions
Comprehensive CSE Lab Solutions repo; encompassing all my lab manuals, codes, documents, and endsem questions from my B.Tech program (2020-2024).
Language: C - Size: 253 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

priteshgohil/CUDA-programming-tutorial
Get started with CUDA programming
Language: Cuda - Size: 3.63 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 7 - Forks: 3

dmikushin/bilinear
A simple image filter example for those who study GPU/CUDA programming
Language: C++ - Size: 347 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 1

seieric/gst-dsobjectsmosaic
📀NVIDIA DeepStream integrated GStreamer Plugin. It can blur objects with cuda cores on Jetson boards. Fast and smooth since everything is done on NVMM.🏎
Language: C++ - Size: 143 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 2

m15kh/Cuda_Programming
CUDA programming enables parallel computing on NVIDIA GPUs for high-performance tasks like deep learning and scientific computing
Language: Cuda - Size: 790 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 0

tgautam03/xFilters
GPU (CUDA) accelerated filters using 2D convolution for high resolution images.
Language: C++ - Size: 58.2 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 6 - Forks: 1

real-space/AngstromCube
A parallel and GPU-accelerated Code for Real-Space All-Electron Linear-Scaling Density Functional Theory
Language: C++ - Size: 32.3 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 6 - Forks: 2

coderonion/cuda-beginner-course-rust-version
bilibili视频【CUDA 12.x 并行编程入门(Rust版)】配套代码
Language: Rust - Size: 10.7 KB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 6 - Forks: 0

artmortal93/PatchMatchStereo_CUDA
PatchMatch Stereo with Red-Black modifiaction and Row Parallel modification for massively parallel computing
Language: C - Size: 113 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 1

jrajan14/CUDA_Programs
Nvidia CUDA Programs. High-performance computing with my collection of CUDA programs, meticulously crafted to harness the immense power of NVIDIA's GPU architecture. From blazingly fast simulations to data-intensive parallel processing, these programs showcase my passion for pushing the boundaries of performance optimization.
Language: Cuda - Size: 30.8 MB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 5 - Forks: 2

coderonion/cuda-beginner-course-python-version
bilibili视频【CUDA 12.x 并行编程入门(Python版)】配套代码
Language: Python - Size: 3.91 KB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

matrix97317/OneTensor
This is a simple and easy-to-use Tensor Library.
Language: Cuda - Size: 2.03 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

ShawnZhong/CUDA-Programming-Starter-Kit
CUDA Programming Starter Kit for VSCode and CLion
Language: C++ - Size: 8.58 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 0

mrakgr/Spiral-s-ML-Library
Spiral's Machine Learning Library
Language: Python - Size: 16.7 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 4 - Forks: 0

GPUEngineering/GPUtils
A C++ header-only library for parallel linear algebra on GPUs (CUDA/cuBLAS under the hood)
Language: Cuda - Size: 401 KB - Last synced at: 29 days ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

yester31/CUDA_EX
CUDA kernel functions
Language: Cuda - Size: 92.9 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 4 - Forks: 2

HamzaGbada/Numba-cuda
This is a tutorial about Numba-CUDA
Language: Jupyter Notebook - Size: 1.44 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 4 - Forks: 0

evanmcclure/hello_gpu
Hello world example for Rust on GPU
Language: Rust - Size: 6.84 KB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 4 - Forks: 0

Accumulated/Accelerating-CNN-on-GPU-using-CUDA-C
This repository is for implementing and accelerating CNN on GPU using NVIDIA CUDA C. The current code has 8 msec execution time for inference. The CNN used is called Efficient Net.
Language: Jupyter Notebook - Size: 39.6 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 1

Gopal-Dahale/hpmoCNN
High-Performance Memory Optimal CNN
Language: C++ - Size: 14.5 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 1

NKU-Yang/Parallel-Programming
南开大学并行程序设计编程作业
Language: C++ - Size: 70.3 KB - Last synced at: 9 months ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 0

artuppp/PupilTrackingGPUPublic
GPU implementations of new, high-performance pupil tracking algorithms, as presented in our paper [cuElSe and cuExCuSe: Highly Parallel and Accurate GPU-based Pupil Tracking for Real-World Applications]
Language: Cuda - Size: 1.34 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 3 - Forks: 0

yashkathe/Image-Noise-Reduction-with-CUDA
This project conducts an analysis of image denoising technique - median blur, comparing GPU-accelerated (Numba) and CPU-based (OpenCV) processing speeds.
Language: Jupyter Notebook - Size: 25.4 MB - Last synced at: 24 days ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

saeedahmadicp/Fundamentals-of-Accelerated-Computing-with-CUDA-Python
Fundamentals of Accelerated Computing with CUDA Python
Language: Jupyter Notebook - Size: 6.63 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 3 - Forks: 0

dragunovdenis/DeepLearning
C++ framework for deep neural networks
Language: C++ - Size: 12.7 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 3 - Forks: 0

lawmurray/gpu-gemm
CUDA kernel for matrix-matrix multiplication on Nvidia GPUs, using a Hilbert curve to improve L2 cache utilization.
Language: Cuda - Size: 34.2 KB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 3 - Forks: 0

muyuuuu/CUFX
晚上下班不刷手机,学点什么。系列一:CUDA 计算框架 CUFX (Cuda Framework eXtended)。
Language: Cuda - Size: 1.7 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 3 - Forks: 0

HROlive/Fundamentals-of-Accelerated-Computing-with-CUDA-C-Cpp
Accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques.
Language: Jupyter Notebook - Size: 4.66 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 1

davide-gurrieri/parallel-GCN
High-performance CUDA C++ implementation of Graph Convolutional Networks
Language: C++ - Size: 11 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

gaetanserre/LiSA
LiSA is a path tracing render engine developped in C++ using NVidia Optix.
Language: C++ - Size: 257 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

marcoplaitano/counting-sort-cuda
Parallelized version of Counting Sort using CUDA
Language: C - Size: 26.4 KB - Last synced at: 23 days ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

yester31/GEMM_Conv2d_CUDA
CUDA Gemm Convolution implementation
Language: C++ - Size: 564 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

Howeng98/All-Pairs_Shortest_Path
This repo is to solve the all-pairs shortest path problem with CPU threads and then further accelerate the program with CUDA accompanied by Blocked Floyd-Warshall algorithm
Language: Cuda - Size: 7.92 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

Guillaume-Helbecque/GPU-accelerated-tree-search-Chapel
GPU-accelerated tree search: Investigating Chapel versus CUDA/HIP+X
Language: C - Size: 488 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2 - Forks: 1

LuongHuuPhuc/Project_2024-2
Parallel programming for Merge sort algorithm using OpenMP and CUDA
Language: Cuda - Size: 3.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2 - Forks: 0

MatteoFasulo/Multi-layer-Neural-Network
A Parallel implementation for a particular kind of multi-layer Neural Network
Language: Cuda - Size: 3.76 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 2 - Forks: 0

Cat-Gawr/AI-Python
Una piccola AI che il suo picco massimo di risposta è stato di 0.02 secondi di risposta | Konata ~ 2025
Language: Python - Size: 863 KB - Last synced at: 19 days ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0
