GitHub topics: nvcc
termoshtt/link_cuda_kernel
HowTo: Compile CUDA with nvcc, and link to Rust
Language: Cuda - Size: 4.88 KB - Last synced at: 16 days ago - Pushed at: almost 7 years ago - Stars: 49 - Forks: 7

0xhilSa/pynum
a small python library for 1D and 2D arrays with GPU support
Language: C - Size: 399 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

TheUnsolvedDev/CUDA_NN_FS
This repository features a from-scratch implementation of a neural network using CUDA and C. The primary goal of this project is to leverage CUDA's parallel computing capabilities to significantly accelerate the training and inference processes of neural networks, utilizing the computational power of NVIDIA GPUs.
Language: Cuda - Size: 61.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 0

coderonion/cuda-beginner-course-cpp-version
bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码
Language: Cuda - Size: 20.5 KB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 30 - Forks: 4

coderonion/cuda-beginner-course-rust-version
bilibili视频【CUDA 12.x 并行编程入门(Rust版)】配套代码
Language: Rust - Size: 10.7 KB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 6 - Forks: 0

maliknaik16/parallel-computing
CUDA programming in C++ for high-performance computing using Nvidia GPUs, optimized for tasks like machine learning, or image processing
Language: C++ - Size: 1.95 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

ShadyBoukhary/GPU-research-FFT-OpenACC-CUDA
Case studies constitute a modern interdisciplinary and valuable teaching practice which plays a critical and fundamental role in the development of new skills and the formation of new knowledge. This research studies the behavior and performance of two interdisciplinary and widely adopted scientific kernels, a Fast Fourier Transform and Matrix Multiplication. Both routines are implemented in the two current most popular many-core programming models CUDA and OpenACC. A Fast Fourier Transform (FFT) samples a signal over a period of time and divides it into its frequency components, computing the Discrete Fourier Transform (DFT) of a sequence. Unlike the traditional approach to computing a DFT, FFT algorithms reduce the complexity of the problem from O(n2) to O(nLog2n). Matrix multiplication is a cornerstone routine in Mathematics, Artificial Intelligence and Machine Learning. This research also shows that the nature of the problem plays a crucial role in determining what many-core model will provide the highest benefit in performance.
Language: Cuda - Size: 9.12 MB - Last synced at: 19 days ago - Pushed at: over 6 years ago - Stars: 13 - Forks: 3

pauloruszel/yolo11_face_detection
Language: Python - Size: 21 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Rodolfo-Gallegos/Brownian-Dynamics-Simulation-OpenACC-OpenMP
This is my thesis work for the Bachelor's degree in Physics. / Este es mi trabajo de titulación para la Licenciatura en Física.
Language: C++ - Size: 5.55 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

saiccoumar/CUDA-Programming-Exercises
Brief collection of GPU exercises (my reimplementation). Comes with relevant resources.
Language: Cuda - Size: 739 KB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

david-palma/cuda-programming
Educational CUDA C/C++ programming repository with commented examples on GPU parallel computing, matrix operations, and performance profiling. Requires a CUDA-enabled NVIDIA GPU.
Language: Cuda - Size: 23.4 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

minnukota381/cuda-parallel-c-programming
This repository contains various CUDA C programs demonstrating parallel computing techniques using NVIDIA's CUDA platform.
Language: Cuda - Size: 19.5 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 11 - Forks: 1

PrinceP/tensorrt-sample-on-threads
A tutorial for getting started on running Tensorrt engine and Deep Learning Accelerator (DLA) models on threads
Language: C++ - Size: 2.93 KB - Last synced at: 6 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

0xhilSa/pycu
PyCu
Language: Cuda - Size: 271 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

coderonion/cuda-beginner-course-python-version
bilibili视频【CUDA 12.x 并行编程入门(Python版)】配套代码
Language: Python - Size: 3.91 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

0xhilSa/vector-CUDA
vector calculation with GPU acceleration using CUDA
Size: 4.88 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

alpha74/CUDA_basics
Nvidia NVCC CUDA programs for begineers.
Language: Cuda - Size: 71.3 KB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

Hobbbbes/MandelBrot-Cuda-olc-PixelGameEngine
Language: Cuda - Size: 327 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

TravisWThompson1/Makefile_Example_CUDA_CPP_To_Executable
Example Makefile for CUDA and C++ source files in a standard project layout.
Language: Cuda - Size: 3.91 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 39 - Forks: 17

iamsubhranil/Renderer
A barebones 3D renderer in C++ and Python
Language: C++ - Size: 418 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

jbcbezerra/gradle-nvcc
Gradle plugin for integrating Cuda's nvcc tool
Language: Kotlin - Size: 75.2 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

gvvsnrnaveen/cuda
this repository contains the various programs that can written using CUDA Toolkit.
Language: C - Size: 1.31 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

kangyolo/get-started-jetson-nano
Guidance for Nvidia Jetson Nano
Size: 816 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Kvatsx/GPU-Computing-Assignments
Language: C - Size: 46.7 MB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

alpha74/HungarianAlgoCUDA
Hungarian Algorithm for Linear Assignment Problem implemented using CUDA.
Language: Cuda - Size: 10.7 KB - Last synced at: 2 months ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

ashutoshIITK/install_cuda_cudnn_ubuntu_20
Tutorial to install NVIDIA Drivers, CUDA 11.4 and cuDNN for deep learning programming on Ubuntu 20.04.
Size: 2.2 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 23 - Forks: 7

phrb/gpu-autotuning
Autotuning NVCC Compiler Parameters, published @ CCPE Journal
Language: C - Size: 471 MB - Last synced at: 18 days ago - Pushed at: about 4 years ago - Stars: 9 - Forks: 2

mkf450/nvcc4jupyter Fork of depctg/nvcc4jupyter
A plugin for Jupyter Notebook to run CUDA C/C++ code
Size: 8.79 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

hardikrana11/Computer-Architecture-Lab
Solutions to assignment given in the class of CO316
Language: C++ - Size: 4.37 MB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

jinparksj/deeplearning_cpp_libraries
Personal libraries for deep learning with C++
Language: C - Size: 2.26 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

manikandan-ravikiran/Leetcode_June_Challenge
Problems of June day to day challenge in Leetcode
Language: Python - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

underscoreanuj/bitonic-sort-visualization
A python script which helps visualize the sorting routine of bitonic sort (executed in parallel using nvcc).
Language: Jupyter Notebook - Size: 340 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

phrb/nvidia-workshop-autotuning
Resources for autotuning CUDA compiler parameters
Language: Julia - Size: 1.31 MB - Last synced at: about 1 month ago - Pushed at: over 7 years ago - Stars: 3 - Forks: 1

gilbertobastos/prj_perceptron_multicamadas_CUDA
Implementação simples do Perceptron Multicamadas em CUDA.
Language: C - Size: 39.1 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

mattbdean/novaXfer
Lightning fast NVCC course equivalencies
Language: TypeScript - Size: 1.69 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0
