An open API service providing repository metadata for many open source software ecosystems.

Topic: "cuda-programming"

taskflow/taskflow

A General-purpose Task-parallel Programming System using Modern C++

Language: C++ - Size: 138 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 10,911 - Forks: 1,275

Rust-GPU/Rust-CUDA

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.

Language: Rust - Size: 6 MB - Last synced at: 10 days ago - Pushed at: 12 days ago - Stars: 4,409 - Forks: 183

brucefan1983/CUDA-Programming

Sample codes for my CUDA programming book

Language: Cuda - Size: 9.13 MB - Last synced at: 24 days ago - Pushed at: 4 months ago - Stars: 1,712 - Forks: 347

NVIDIA/cccl

CUDA Core Compute Libraries

Language: C++ - Size: 81.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,667 - Forks: 220

mit-han-lab/TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

Language: C++ - Size: 83.3 MB - Last synced at: 8 days ago - Pushed at: 11 months ago - Stars: 852 - Forks: 85

eyalroz/cuda-api-wrappers

Thin, unified, C++-flavored wrappers for the CUDA APIs

Language: C++ - Size: 2.87 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 839 - Forks: 83

coreylowman/cudarc

Safe rust wrapper around CUDA toolkit

Language: Rust - Size: 2.91 MB - Last synced at: 17 days ago - Pushed at: about 1 month ago - Stars: 838 - Forks: 101

sail-sg/Adan

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

Language: Python - Size: 1.3 MB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 784 - Forks: 67

harleyszhang/llm_note

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

Language: Python - Size: 177 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 776 - Forks: 80

PaddleJitLab/CUDATutorial

A self-learning tutorail for CUDA High Performance Programing.

Language: JavaScript - Size: 108 MB - Last synced at: 24 days ago - Pushed at: about 2 months ago - Stars: 628 - Forks: 69

nosferalatu/SimpleGPUHashTable

A simple GPU hash table implemented in CUDA using lock free techniques

Language: Cuda - Size: 297 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 394 - Forks: 41

HMUNACHI/cuda-tutorials

CUDA tutorials for Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.

Language: Cuda - Size: 423 KB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 183 - Forks: 5

MuGdxy/muda

μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.

Language: C++ - Size: 14.7 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 174 - Forks: 8

jaredhoberock/stanford-cs193g-sp2010

This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010

Language: C++ - Size: 127 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 170 - Forks: 73

SunsetQuest/CudaPAD

CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.

Language: C# - Size: 1.18 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 117 - Forks: 16

tgautam03/xGeMM

Accelerated General (FP32) Matrix Multiplication from scratch in CUDA

Language: Cuda - Size: 5.8 MB - Last synced at: 12 days ago - Pushed at: 5 months ago - Stars: 115 - Forks: 7

ROCm/HIP-CPU

An implementation of HIP that works on CPUs, across OSes.

Language: C++ - Size: 776 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 115 - Forks: 18

eyalroz/cuda-kat

CUDA kernel author's tools

Language: Cuda - Size: 1.57 MB - Last synced at: 7 months ago - Pushed at: about 3 years ago - Stars: 107 - Forks: 8

mikeroyal/CUDA-Guide

CUDA Guide

Language: Cuda - Size: 83 KB - Last synced at: 25 days ago - Pushed at: over 1 year ago - Stars: 64 - Forks: 7

emptysoal/cuda-image-preprocess

Speed up image preprocess with cuda when handle image or tensorrt inference

Language: Cuda - Size: 91.8 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 63 - Forks: 5

FahimFBA/CUDA-WSL2-Ubuntu

Install CUDA on Windows11 using WSL2

Language: Jupyter Notebook - Size: 10.4 MB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 62 - Forks: 4

HuangCongQing/cuda-learning

cuda编程学习入门

Language: Cuda - Size: 5.66 MB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 35 - Forks: 6

LinhanDai/yolov9-tensorrt

YOLOv9 Tensorrt deployment acceleration,provide two implementation methods: C++and Python🔥🔥🔥

Language: C++ - Size: 1.07 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 32 - Forks: 6

coderonion/cuda-beginner-course-cpp-version

bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码

Language: Cuda - Size: 20.5 KB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 29 - Forks: 5

ashvardanian/cuda-python-starter-kit

Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11

Language: Cuda - Size: 238 KB - Last synced at: 2 days ago - Pushed at: 3 months ago - Stars: 26 - Forks: 3

Koushikphy/Intro-to-CUDA-Fortran

A Complete beginner's introduction to programming with CUDA Fortran

Size: 200 KB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 26 - Forks: 1

jerry060599/KittenGpuLBVH

A high performance and friendly GPU LBVH implementation.

Language: Cuda - Size: 90.8 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 24 - Forks: 4

xmba15/ransac_lines_fitting_gpu

simple GPU ransac fitting of multiple lines on 2d/3d point cloud

Language: C++ - Size: 50.8 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 23 - Forks: 7

Lin-Mao/DrGPUM

A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.

Language: Python - Size: 248 KB - Last synced at: 6 months ago - Pushed at: 8 months ago - Stars: 22 - Forks: 2

fjramireg/StiffMa

StiffMa: Fast finite element STIFFness MAtrix generation in MATLAB by using GPU computing.

Language: MATLAB - Size: 68.4 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 19 - Forks: 5

YichengDWu/FlashAttention.jl

Julia implementation of the Flash Attention algorithm

Language: Julia - Size: 898 KB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 18 - Forks: 1

AhmetFurkanDEMIR/NVIDIA-GPU-benchmark

NVIDIA GPU benchmark

Language: Jupyter Notebook - Size: 49.8 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 18 - Forks: 2

KarhouTam/cuda-kernels

Some common CUDA kernel implementations (Not the fastest).

Language: Cuda - Size: 57.6 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 17 - Forks: 1

RRZE-HPC/MD-Bench

A performance-oriented prototyping harness for state of the art Molecular Dynamics algorithms

Language: C - Size: 4.56 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 15 - Forks: 8

emptysoal/YOLOv5-TensorRT-lib-Python

The code of YOLOv5 inferencing with TensorRT C++ api is packaged into a dynamic link library , then called through Python.

Language: Cuda - Size: 749 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 15 - Forks: 1

imsanjoykb/CUDA-Bootcamp

CUDA Programming Practices

Language: Cuda - Size: 6.14 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 15 - Forks: 3

littlebearsama/xxCu3Dlibrary

cuda 加速3D点云算法库,持续更新(含cudaicp,glfw点云可视化等)

Language: C - Size: 19.2 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 14 - Forks: 0

iamrohitsuthar/LP1

SPPU BE COMP Codes of LP1 - HPC, AIR, and DA

Language: Jupyter Notebook - Size: 6.22 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 14 - Forks: 10

tgautam03/tGeMM

General Matrix Multiplication using NVIDIA Tensor Cores

Language: Cuda - Size: 47.9 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 13 - Forks: 3

karthikeyann/cuda-calculator Fork of szho42/cuda-calculator

HTML/JS port of CUDA Occupancy Calculator

Language: CoffeeScript - Size: 170 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 13 - Forks: 7

emptysoal/TensorRT-v8-YOLOv5-v5.0

Based on TensorRT v8.2, build network for YOLOv5-v5.0 by myself, speed up YOLOv5-v5.0 inferencing

Language: C++ - Size: 431 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 12 - Forks: 1

guomc9/CudaRayTracing

A simple ray-tracing program implemented with CUDA.

Language: C++ - Size: 120 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 12 - Forks: 1

l3lackcurtains/dbscan-kdtree-cuda

:fries: Massively parallel DBSCAN algorithm implemented in CUDA along with a KD-Tree for searching neighbors.

Language: Cuda - Size: 16.1 MB - Last synced at: 21 days ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 4

l3lackcurtains/dbscan-cuda

:pizza: Massively parallel DBSCAN algorithm implemented in CUDA.

Language: Cuda - Size: 22 MB - Last synced at: 21 days ago - Pushed at: almost 5 years ago - Stars: 12 - Forks: 2

minnukota381/cuda-parallel-c-programming

This repository contains various CUDA C programs demonstrating parallel computing techniques using NVIDIA's CUDA platform.

Language: Cuda - Size: 19.5 KB - Last synced at: 6 days ago - Pushed at: 9 months ago - Stars: 11 - Forks: 1

Chen-Si-An/Mesh-Reconstruction

Reconstruct mesh from point cloud data generated by 3D scanner

Language: C++ - Size: 61.8 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 11 - Forks: 0

MolSSI-Education/gpu_programming_beginner

Fundamentals of heterogeneous parallel programming with CUDA C/C++ at the beginner level.

Language: Python - Size: 5.25 MB - Last synced at: 14 days ago - Pushed at: about 2 years ago - Stars: 11 - Forks: 2

flin3500/Cuda-Google-Colab

The cuda code is mainly for nvidia hardware device. This repo will show how to run cuda c or cuda cpp code on the google colab platform for free.

Language: Jupyter Notebook - Size: 31.3 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 10 - Forks: 2

GithubRealFan/keccak256-blockchain-hash-opencl-kernel

Language: C - Size: 2.93 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 1

mhezarei/CUDA-RGB-grey

Converts an RGB image to greyscale using parallel programming.

Language: C++ - Size: 230 KB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 10 - Forks: 1

florist-notes/aicore_s

AI, IoT and Robotics Hardware + ROS

Language: Jupyter Notebook - Size: 361 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 9 - Forks: 1

RainerMtb/cuvista

Accelerated Optical Video Stabilizer, Cuda, OpenCL, Avx512

Language: C++ - Size: 45.1 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 9 - Forks: 1

phbastosa/SeisFAT3D

Modeling, inversion and migration focusing on seismic first-arrivals.

Language: Cuda - Size: 237 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 9 - Forks: 1

TheUnsolvedDev/CUDA_NN_FS

This repository features a from-scratch implementation of a neural network using CUDA and C. The primary goal of this project is to leverage CUDA's parallel computing capabilities to significantly accelerate the training and inference processes of neural networks, utilizing the computational power of NVIDIA GPUs.

Language: Cuda - Size: 61.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 9 - Forks: 0

nssharmaofficial/kmeans-in-cuda

K-Means algorithm parallelized in CUDA

Language: Cuda - Size: 23.3 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 9 - Forks: 0

professorcode1/Event-Analysis

Library for Event Synchronization and Event Coincidence Analysis

Language: Jupyter Notebook - Size: 1020 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

GithubRealFan/Simple-Projects-CUDA

Language: Cuda - Size: 73.2 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 0

PanosAntoniadis/cuda-exercises-ntua

Lab exercise of Parallel Processing course in NTUA regarding CUDA programming

Language: Cuda - Size: 2.84 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 9 - Forks: 0

DmitryAsdre/rocauc_pairwise

RocAuc Pairiwse objective for gradient boosting

Language: Python - Size: 1.77 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 8 - Forks: 1

GithubRealFan/Matrix-Multiply-CUDA

Language: Cuda - Size: 21.5 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 8 - Forks: 1

neoblizz/HIP_template

🖤 Template for starting HIP/C++ project using CMake with Github Action for CI.

Language: CMake - Size: 26.4 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

artuppp/EllipseFitCUDA

Ellipse Fit Implementation in CUDA

Language: Cuda - Size: 41 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 7 - Forks: 0

maya-undefined/gpu-desktop-calculator

Language: Cuda - Size: 48.8 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 0

mixed-farming/CSE-lab-solutions

Comprehensive CSE Lab Solutions repo; encompassing all my lab manuals, codes, documents, and endsem questions from my B.Tech program (2020-2024).

Language: C - Size: 253 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

priteshgohil/CUDA-programming-tutorial

Get started with CUDA programming

Language: Cuda - Size: 3.63 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 7 - Forks: 3

dmikushin/bilinear

A simple image filter example for those who study GPU/CUDA programming

Language: C++ - Size: 347 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 1

seieric/gst-dsobjectsmosaic

📀NVIDIA DeepStream integrated GStreamer Plugin. It can blur objects with cuda cores on Jetson boards. Fast and smooth since everything is done on NVMM.🏎

Language: C++ - Size: 143 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 2

m15kh/Cuda_Programming

CUDA programming enables parallel computing on NVIDIA GPUs for high-performance tasks like deep learning and scientific computing

Language: Cuda - Size: 790 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 0

tgautam03/xFilters

GPU (CUDA) accelerated filters using 2D convolution for high resolution images.

Language: C++ - Size: 58.2 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 6 - Forks: 1

real-space/AngstromCube

A parallel and GPU-accelerated Code for Real-Space All-Electron Linear-Scaling Density Functional Theory

Language: C++ - Size: 32.3 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 6 - Forks: 2

coderonion/cuda-beginner-course-rust-version

bilibili视频【CUDA 12.x 并行编程入门(Rust版)】配套代码

Language: Rust - Size: 10.7 KB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 6 - Forks: 0

artmortal93/PatchMatchStereo_CUDA

PatchMatch Stereo with Red-Black modifiaction and Row Parallel modification for massively parallel computing

Language: C - Size: 113 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 1

jrajan14/CUDA_Programs

Nvidia CUDA Programs. High-performance computing with my collection of CUDA programs, meticulously crafted to harness the immense power of NVIDIA's GPU architecture. From blazingly fast simulations to data-intensive parallel processing, these programs showcase my passion for pushing the boundaries of performance optimization.

Language: Cuda - Size: 30.8 MB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 5 - Forks: 2

coderonion/cuda-beginner-course-python-version

bilibili视频【CUDA 12.x 并行编程入门(Python版)】配套代码

Language: Python - Size: 3.91 KB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

matrix97317/OneTensor

This is a simple and easy-to-use Tensor Library.

Language: Cuda - Size: 2.03 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

ShawnZhong/CUDA-Programming-Starter-Kit

CUDA Programming Starter Kit for VSCode and CLion

Language: C++ - Size: 8.58 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 0

mrakgr/Spiral-s-ML-Library

Spiral's Machine Learning Library

Language: Python - Size: 16.7 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 4 - Forks: 0

GPUEngineering/GPUtils

A C++ header-only library for parallel linear algebra on GPUs (CUDA/cuBLAS under the hood)

Language: Cuda - Size: 401 KB - Last synced at: 29 days ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

yester31/CUDA_EX

CUDA kernel functions

Language: Cuda - Size: 92.9 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 4 - Forks: 2

HamzaGbada/Numba-cuda

This is a tutorial about Numba-CUDA

Language: Jupyter Notebook - Size: 1.44 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 4 - Forks: 0

evanmcclure/hello_gpu

Hello world example for Rust on GPU

Language: Rust - Size: 6.84 KB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 4 - Forks: 0

Accumulated/Accelerating-CNN-on-GPU-using-CUDA-C

This repository is for implementing and accelerating CNN on GPU using NVIDIA CUDA C. The current code has 8 msec execution time for inference. The CNN used is called Efficient Net.

Language: Jupyter Notebook - Size: 39.6 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 1

Gopal-Dahale/hpmoCNN

High-Performance Memory Optimal CNN

Language: C++ - Size: 14.5 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 1

NKU-Yang/Parallel-Programming

南开大学并行程序设计编程作业

Language: C++ - Size: 70.3 KB - Last synced at: 9 months ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 0

artuppp/PupilTrackingGPUPublic

GPU implementations of new, high-performance pupil tracking algorithms, as presented in our paper [cuElSe and cuExCuSe: Highly Parallel and Accurate GPU-based Pupil Tracking for Real-World Applications]

Language: Cuda - Size: 1.34 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 3 - Forks: 0

yashkathe/Image-Noise-Reduction-with-CUDA

This project conducts an analysis of image denoising technique - median blur, comparing GPU-accelerated (Numba) and CPU-based (OpenCV) processing speeds.

Language: Jupyter Notebook - Size: 25.4 MB - Last synced at: 24 days ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

saeedahmadicp/Fundamentals-of-Accelerated-Computing-with-CUDA-Python

Fundamentals of Accelerated Computing with CUDA Python

Language: Jupyter Notebook - Size: 6.63 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 3 - Forks: 0

dragunovdenis/DeepLearning

C++ framework for deep neural networks

Language: C++ - Size: 12.7 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 3 - Forks: 0

lawmurray/gpu-gemm

CUDA kernel for matrix-matrix multiplication on Nvidia GPUs, using a Hilbert curve to improve L2 cache utilization.

Language: Cuda - Size: 34.2 KB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 3 - Forks: 0

muyuuuu/CUFX

晚上下班不刷手机,学点什么。系列一:CUDA 计算框架 CUFX (Cuda Framework eXtended)。

Language: Cuda - Size: 1.7 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 3 - Forks: 0

HROlive/Fundamentals-of-Accelerated-Computing-with-CUDA-C-Cpp

Accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques.

Language: Jupyter Notebook - Size: 4.66 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 1

davide-gurrieri/parallel-GCN

High-performance CUDA C++ implementation of Graph Convolutional Networks

Language: C++ - Size: 11 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

gaetanserre/LiSA

LiSA is a path tracing render engine developped in C++ using NVidia Optix.

Language: C++ - Size: 257 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

marcoplaitano/counting-sort-cuda

Parallelized version of Counting Sort using CUDA

Language: C - Size: 26.4 KB - Last synced at: 23 days ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

yester31/GEMM_Conv2d_CUDA

CUDA Gemm Convolution implementation

Language: C++ - Size: 564 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

Howeng98/All-Pairs_Shortest_Path

This repo is to solve the all-pairs shortest path problem with CPU threads and then further accelerate the program with CUDA accompanied by Blocked Floyd-Warshall algorithm

Language: Cuda - Size: 7.92 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

Guillaume-Helbecque/GPU-accelerated-tree-search-Chapel

GPU-accelerated tree search: Investigating Chapel versus CUDA/HIP+X

Language: C - Size: 488 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2 - Forks: 1

LuongHuuPhuc/Project_2024-2

Parallel programming for Merge sort algorithm using OpenMP and CUDA

Language: Cuda - Size: 3.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2 - Forks: 0

MatteoFasulo/Multi-layer-Neural-Network

A Parallel implementation for a particular kind of multi-layer Neural Network

Language: Cuda - Size: 3.76 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 2 - Forks: 0

Cat-Gawr/AI-Python

Una piccola AI che il suo picco massimo di risposta è stato di 0.02 secondi di risposta | Konata ~ 2025

Language: Python - Size: 863 KB - Last synced at: 19 days ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0