Topic: "high-performance-computing"
taskflow/taskflow
A General-purpose Task-parallel Programming System using Modern C++
Language: C++ - Size: 142 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 11,321 - Forks: 1,323
Netflix/metaflow
Build, Manage and Deploy AI/ML Systems
Language: Python - Size: 44.7 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 9,580 - Forks: 931
google/tf-quant-finance
High-performance TensorFlow library for quantitative finance.
Language: Python - Size: 16.9 MB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 5,010 - Forks: 641
ProjectPhysX/FluidX3D
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
Language: C++ - Size: 21.3 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 4,728 - Forks: 425
parallel101/course
高性能并行编程与优化 - 课件
Language: C++ - Size: 430 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 3,998 - Forks: 551
alpa-projects/alpa 📦
Training and serving large-scale neural networks with auto parallelization.
Language: Python - Size: 7.11 MB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 3,160 - Forks: 353
merrymercy/awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
Size: 98.6 KB - Last synced at: 13 days ago - Pushed at: about 1 year ago - Stars: 2,660 - Forks: 320
bshoshany/thread-pool
BS::thread_pool: a fast, lightweight, modern, and easy-to-use C++17 / C++20 / C++23 thread pool library
Language: C++ - Size: 343 KB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 2,648 - Forks: 287
flame/blis
BLAS-like Library Instantiation Software Framework
Language: C - Size: 52.2 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 2,539 - Forks: 399
kokkos/kokkos
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
Language: C++ - Size: 37.7 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 2,350 - Forks: 472
BOINC/boinc
Open-source software for volunteer computing and grid computing.
Language: PHP - Size: 273 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2,235 - Forks: 495
mfem/mfem
Lightweight, general, scalable C++ library for finite element methods
Language: C++ - Size: 261 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2,015 - Forks: 570
chapel-lang/chapel
a Productive Parallel Programming Language
Language: Chapel - Size: 1010 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,948 - Forks: 433
hermit-os/hermit-rs
Hermit for Rust.
Language: Rust - Size: 2.04 MB - Last synced at: 3 days ago - Pushed at: 5 days ago - Stars: 1,836 - Forks: 101
AdaptiveCpp/AdaptiveCpp
Compiler for multiple programming models (SYCL, C++ standard parallelism, HIP/CUDA) for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!
Language: C++ - Size: 14.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,723 - Forks: 199
Maratyszcza/NNPACK
Acceleration package for neural networks on multi-core CPUs
Language: C - Size: 1.06 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 1,687 - Forks: 315
mratsim/Arraymancer
A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
Language: Nim - Size: 3.8 MB - Last synced at: 3 days ago - Pushed at: 8 months ago - Stars: 1,380 - Forks: 95
hermit-os/kernel
A Rust-based, lightweight unikernel.
Language: Rust - Size: 63.9 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1,356 - Forks: 109
ropensci/drake
An R-focused pipeline toolkit for reproducibility and high-performance computing
Language: R - Size: 92.4 MB - Last synced at: 8 days ago - Pushed at: 11 months ago - Stars: 1,341 - Forks: 129
trilinos/Trilinos
Primary repository for the Trilinos Project
Language: C++ - Size: 815 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,326 - Forks: 602
sail-sg/envpool
C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
Language: C++ - Size: 3.53 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 1,195 - Forks: 117
Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
Language: Cuda - Size: 1.25 MB - Last synced at: 29 days ago - Pushed at: over 2 years ago - Stars: 1,158 - Forks: 170
uncomplicate/neanderthal
Fast Clojure Matrix Library
Language: Clojure - Size: 3.96 MB - Last synced at: 4 days ago - Pushed at: 9 days ago - Stars: 1,111 - Forks: 58
ropensci/targets
Function-oriented Make-like declarative workflows for R
Language: R - Size: 7.17 MB - Last synced at: 7 days ago - Pushed at: 10 days ago - Stars: 1,031 - Forks: 76
mateogianolio/vectorious
Linear algebra in TypeScript.
Language: TypeScript - Size: 42.8 MB - Last synced at: 30 days ago - Pushed at: over 1 year ago - Stars: 919 - Forks: 43
openmc-dev/openmc
OpenMC Monte Carlo Code
Language: Python - Size: 72.4 MB - Last synced at: about 11 hours ago - Pushed at: about 14 hours ago - Stars: 901 - Forks: 577
precice/precice
A coupling library for partitioned multi-physics simulations, including, but not restricted to fluid-structure interaction and conjugate heat transfer simulations.
Language: C++ - Size: 40.3 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 842 - Forks: 202
Geant4/geant4
Geant4 toolkit for the simulation of the passage of particles through matter - NIM A 506 (2003) 250-303
Language: C++ - Size: 349 MB - Last synced at: 11 days ago - Pushed at: 4 months ago - Stars: 727 - Forks: 352
AMReX-Codes/amrex
AMReX: Software Framework for Block Structured AMR
Language: C++ - Size: 55.7 MB - Last synced at: 4 days ago - Pushed at: 6 days ago - Stars: 669 - Forks: 423
brucefan1983/GPUMD
Graphics Processing Units Molecular Dynamics
Language: Cuda - Size: 313 MB - Last synced at: 3 days ago - Pushed at: 8 days ago - Stars: 656 - Forks: 154
MarioSieg/magnetron
(WIP) A small but powerful, homemade PyTorch from scratch.
Language: C - Size: 27.5 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 650 - Forks: 30
zanellia/prometeo
An experimental Python-to-C transpiler and domain specific language for embedded high-performance computing
Language: Python - Size: 1.93 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 641 - Forks: 34
LLNL/sundials
Official development repository for SUNDIALS - a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers. Pull requests are welcome for bug fixes and minor changes.
Language: C - Size: 249 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 607 - Forks: 157
austinksmith/Hamsters.js
100% Vanilla Javascript Multithreading & Parallel Execution Library
Language: JavaScript - Size: 42.9 MB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 596 - Forks: 31
spcl/dace
DaCe - Data Centric Parallel Programming
Language: Python - Size: 153 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 557 - Forks: 145
DeveloperPaul123/thread-pool
A modern, fast, lightweight thread pool library based on C++2x
Language: C++ - Size: 729 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 527 - Forks: 42
3dem/relion
Image-processing software for cryo-electron microscopy
Language: C++ - Size: 58.1 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 500 - Forks: 220
pypr/pysph
A framework for Smoothed Particle Hydrodynamics in Python
Language: Python - Size: 7.09 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 500 - Forks: 144
mpi4jax/mpi4jax
Zero-copy MPI communication of JAX arrays, for turbo-charged HPC applications in Python :zap:
Language: Python - Size: 5.08 MB - Last synced at: 1 day ago - Pushed at: 16 days ago - Stars: 498 - Forks: 32
neuronsimulator/nrn
NEURON Simulator
Language: C++ - Size: 164 MB - Last synced at: 5 days ago - Pushed at: 8 days ago - Stars: 477 - Forks: 128
philipturner/metal-flash-attention
FlashAttention (Metal Port)
Language: Swift - Size: 9.26 MB - Last synced at: 7 months ago - Pushed at: about 1 year ago - Stars: 459 - Forks: 23
Xiangyu-Hu/SPHinXsys
SPHinXsys provides C++ APIs for engineering simulation and optimization. It aims at complex systems driven by fluid, structure, multi-body dynamics and beyond. The multi-physics library is based on a unique and unified computational framework by which strong coupling has been achieved for all involved physics.
Language: C++ - Size: 244 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 454 - Forks: 318
cselab/aphros
Finite volume solver for incompressible multiphase flows with surface tension. Foaming flows in complex geometries.
Language: C++ - Size: 205 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 426 - Forks: 51
CurvineIO/curvine
High-performance distributed multi-level cache system. Built by Rust.
Language: Rust - Size: 1.88 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 392 - Forks: 54
GraphIt-DSL/graphit
GraphIt - A High-Performance Domain Specific Language for Graph Analytics
Language: C++ - Size: 8.48 MB - Last synced at: 6 months ago - Pushed at: almost 3 years ago - Stars: 377 - Forks: 46
uncomplicate/bayadera
High-performance Bayesian Data Analysis on the GPU in Clojure
Language: Clojure - Size: 1020 KB - Last synced at: 6 months ago - Pushed at: about 5 years ago - Stars: 365 - Forks: 23
QMCPACK/qmcpack
Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
Language: C++ - Size: 396 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 362 - Forks: 149
SciML/Surrogates.jl
Surrogate modeling and optimization for scientific machine learning (SciML)
Language: Julia - Size: 327 MB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 356 - Forks: 75
mrshaw01/software-engineer
A curated learning repository focused on High-Performance Computing (HPC) — covering fundamentals to advanced topics in CUDA, MPI, C++, and Python-C++ interoperability.
Language: C++ - Size: 41 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 355 - Forks: 61
huggingface/datablations
Scaling Data-Constrained Language Models
Language: Jupyter Notebook - Size: 45.8 MB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 342 - Forks: 19
Glavnokoman/vuh
Vulkan compute for people
Language: C++ - Size: 705 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 340 - Forks: 34
dionhaefner/pyhpc-benchmarks
A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python :rocket:
Language: Python - Size: 1.19 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 330 - Forks: 27
DragonSpit/HPCsharp
High performance algorithms in C#: SIMD/SSE, multi-core and faster
Language: C# - Size: 1.27 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 326 - Forks: 35
feelpp/feelpp
:gem: Feel++: Finite Element Embedded Language and Library in C++
Language: C++ - Size: 349 MB - Last synced at: 9 days ago - Pushed at: 13 days ago - Stars: 325 - Forks: 68
ornladios/ADIOS2
Next generation of ADIOS developed in the Exascale Computing Program
Language: C++ - Size: 33.8 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 306 - Forks: 141
nebius/soperator
Run Slurm in Kubernetes
Language: Go - Size: 39.8 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 304 - Forks: 42
zero-one-group/geni
A Clojure dataframe library that runs on Spark
Language: Clojure - Size: 1.86 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 292 - Forks: 27
mratsim/laser
The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers
Language: Nim - Size: 3.65 MB - Last synced at: 3 days ago - Pushed at: almost 2 years ago - Stars: 290 - Forks: 14
r-lib/mirai
Minimalist Async Evaluation Framework for R
Language: R - Size: 14.6 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 288 - Forks: 16
SciML/NonlinearSolve.jl
High-performance and differentiation-enabled nonlinear solvers (Newton methods), bracketed rootfinding (bisection, Falsi), with sparsity and Newton-Krylov support.
Language: Julia - Size: 39.8 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 280 - Forks: 56
uncomplicate/clojurecl
ClojureCL is a Clojure library for parallel computations with OpenCL.
Language: Clojure - Size: 910 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 280 - Forks: 18
Trinkle23897/Fast-Poisson-Image-Editing
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.
Language: Python - Size: 2.88 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 277 - Forks: 16
geodynamics/aspect
A parallel, extensible finite element code to simulate convection in both 2D and 3D models.
Language: C++ - Size: 379 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 269 - Forks: 252
hongbo-miao/hongbomiao.com
A personal research and development (R&D) lab that facilitates the sharing of knowledge.
Language: Python - Size: 1010 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 259 - Forks: 42
sourceryinstitute/OpenCoarrays
A parallel application binary interface for Fortran 2018 compilers.
Language: Fortran - Size: 8.52 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 258 - Forks: 55
flexi-framework/flexi
FLEXI: A high order discontinuous Galerkin framework for hyperbolic–parabolic conservation laws
Language: Fortran - Size: 75.3 MB - Last synced at: 17 days ago - Pushed at: 18 days ago - Stars: 257 - Forks: 69
ProjectPhysX/OpenCL-Benchmark
A small OpenCL benchmark program to measure peak GPU/CPU performance.
Language: C++ - Size: 286 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 255 - Forks: 31
CaNS-World/CaNS
A code for fast, massively-parallel direct numerical simulations (DNS) of canonical flows
Language: Fortran - Size: 1.12 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 250 - Forks: 85
ECP-copa/Cabana
Performance-portable library for particle-based simulations
Language: C++ - Size: 260 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 250 - Forks: 59
intel/intel-qs
High-performance simulator of quantum circuits
Language: C++ - Size: 17.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 250 - Forks: 74
cb-geo/mpm
CB-Geo High-Performance Material Point Method
Language: C++ - Size: 7.47 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 248 - Forks: 83
Shenggan/awesome-distributed-ml
A curated list of awesome projects and papers for distributed training or inference
Size: 44.9 KB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 247 - Forks: 29
flame/libflame
High-performance object-based library for DLA computations
Language: Fortran - Size: 31.3 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 245 - Forks: 84
DLR-AMR/t8code
Parallel algorithms and data structures for tree-based adaptive mesh refinement (AMR) with arbitrary element shapes.
Language: C++ - Size: 147 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 238 - Forks: 61
CEED/libCEED
CEED Library: Code for Efficient Extensible Discretizations
Language: C - Size: 20.9 MB - Last synced at: 6 days ago - Pushed at: 11 days ago - Stars: 236 - Forks: 61
hermit-os/libhermit 📦
HermitCore: A C-based, lightweight unikernel
Language: C - Size: 42.6 MB - Last synced at: 14 days ago - Pushed at: almost 4 years ago - Stars: 226 - Forks: 43
arborx/ArborX
Performance-portable geometric search library
Language: C++ - Size: 4.79 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 210 - Forks: 45
esa/torchquad
Numerical integration in arbitrary dimensions on the GPU using PyTorch / TF / JAX
Language: Python - Size: 10.9 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 209 - Forks: 43
penn-graphics-research/claymore
Language: Cuda - Size: 30.7 MB - Last synced at: 6 months ago - Pushed at: over 2 years ago - Stars: 209 - Forks: 31
springer13/hptt
High-Performance Tensor Transpose library
Language: C++ - Size: 818 KB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 200 - Forks: 49
AvtechScientific/ASL
Advanced Simulation Library - hardware accelerated multiphysics simulation platform.
Language: C++ - Size: 24.1 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 200 - Forks: 54
tikv/minstant
Performant time measuring in Rust
Language: Rust - Size: 226 KB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 195 - Forks: 20
SciML/MethodOfLines.jl
Automatic Finite Difference PDE solving with Julia SciML
Language: Julia - Size: 370 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 188 - Forks: 39
mlr-org/batchtools
Tools for computation on batch systems
Language: R - Size: 6.59 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 183 - Forks: 52
hao-lh/the-books-making-you-better
This repo is a curated library to help you achieve a deeper understanding of what drives success and continuous improvement. Dive in, and discover content that can expand your thinking, sharpen your expertise, and fuel you drive better, whether you’re exploring new fields, honing in-demand skills, or simply looking for fresh perspectives.
Size: 254 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 181 - Forks: 22
SlinkyProject/slurm-operator
Run Slurm on Kubernetes. A Slinky project.
Language: Go - Size: 3.18 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 176 - Forks: 47
LibRapid/librapid
A highly optimised C++ library for mathematical applications and neural networks.
Language: C++ - Size: 30.3 MB - Last synced at: 3 days ago - Pushed at: 2 months ago - Stars: 176 - Forks: 10
rabauke/mpl
A C++17 message passing library based on MPI
Language: C++ - Size: 33.9 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 171 - Forks: 30
lanl/vpic
Vector Particle-In-Cell (VPIC) Project
Language: C++ - Size: 23.1 MB - Last synced at: 27 days ago - Pushed at: 4 months ago - Stars: 167 - Forks: 76
Keysight/Jlsca
Side-channel toolkit in Julia
Language: Julia - Size: 30.7 MB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 165 - Forks: 34
kahypar/mt-kahypar
Mt-KaHyPar (Multi-Threaded Karlsruhe Hypergraph Partitioner) is a shared-memory multilevel graph and hypergraph partitioner equipped with parallel implementations of techniques used in the best sequential partitioning algorithms. Mt-KaHyPar can partition extremely large hypergraphs very fast and with high quality.
Language: C++ - Size: 34 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 163 - Forks: 32
Yihao-Shi/GeoTaichi
A Taichi-powered high-performance numerical simulator for multiscale and multifield geophysical problems
Language: Python - Size: 91.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 161 - Forks: 23
pranabdas/espresso
Notes and tutorials on Density Functional Theory calculation using Quantum ESPRESSO.
Language: Jupyter Notebook - Size: 56.3 MB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 159 - Forks: 52
claudebarthels/infinity
A lightweight C++ RDMA library for InfiniBand networks.
Language: C++ - Size: 37.1 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 155 - Forks: 40
mschubert/clustermq
R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
Language: R - Size: 6.28 MB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 153 - Forks: 28
dash-project/dash
DASH, the C++ Template Library for Distributed Data Structures with Support for Hierarchical Locality for HPC and Data-Driven Science
Language: C++ - Size: 14.5 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 150 - Forks: 43
eBay/accelerator 📦
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Language: Python - Size: 2.18 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 149 - Forks: 28
dftfeDevelopers/dftfe
DFT-FE: Real-space DFT calculations using Finite Elements
Language: C++ - Size: 92.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 148 - Forks: 42
wlandau/crew
A distributed worker launcher
Language: R - Size: 16.6 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 147 - Forks: 4
ropensci/tarchetypes
Archetypes for targets and pipelines
Language: R - Size: 1.89 MB - Last synced at: 14 days ago - Pushed at: 15 days ago - Stars: 146 - Forks: 20