Topic: "parallel-computing"
taskflow/taskflow
A General-purpose Task-parallel Programming System using Modern C++
Language: C++ - Size: 138 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 10,911 - Forks: 1,275

joblib/joblib
Computing with Python functions.
Language: Python - Size: 4.11 MB - Last synced at: 6 days ago - Pushed at: 16 days ago - Stars: 4,080 - Forks: 433

parallel101/course
高性能并行编程与优化 - 课件
Language: C++ - Size: 430 MB - Last synced at: 18 days ago - Pushed at: 8 months ago - Stars: 3,998 - Forks: 551

OpenNMT/CTranslate2
Fast inference engine for Transformer models
Language: C++ - Size: 14.5 MB - Last synced at: 19 days ago - Pushed at: 2 months ago - Stars: 3,810 - Forks: 358

amilajack/reading
A list of computer-science readings I recommend
Size: 523 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 3,310 - Forks: 728

jofpin/turbit
Build applications, scripts, and automations powered by high-performance multicore computing using Node.js
Language: JavaScript - Size: 179 KB - Last synced at: 11 days ago - Pushed at: 10 months ago - Stars: 2,837 - Forks: 498

jmcarpenter2/swifter
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
Language: Python - Size: 2.15 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 2,611 - Forks: 104

kokkos/kokkos
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
Language: C++ - Size: 35.5 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2,212 - Forks: 457

geatpy-dev/geatpy
Evolutionary algorithm toolbox and framework with high performance for Python
Language: Python - Size: 575 MB - Last synced at: 19 days ago - Pushed at: 5 months ago - Stars: 2,074 - Forks: 729

mfem/mfem
Lightweight, general, scalable C++ library for finite element methods
Language: C++ - Size: 239 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,905 - Forks: 534

chapel-lang/chapel
a Productive Parallel Programming Language
Language: Chapel - Size: 1000 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,880 - Forks: 428

NVIDIA/cccl
CUDA Core Compute Libraries
Language: C++ - Size: 81.6 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,667 - Forks: 220

zwang4/awesome-machine-learning-in-compilers
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
Size: 367 KB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 1,550 - Forks: 165

VcDevel/Vc
SIMD Vector Classes for C++
Language: C++ - Size: 11.2 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 1,490 - Forks: 152

JuliaSymbolics/Symbolics.jl
Symbolic programming for the next generation of numerical software
Language: Julia - Size: 34.5 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 1,422 - Forks: 168

pyper-dev/pyper
Concurrent Python made simple
Language: Python - Size: 462 KB - Last synced at: about 6 hours ago - Pushed at: 4 months ago - Stars: 1,421 - Forks: 28

mratsim/Arraymancer
A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
Language: Nim - Size: 3.8 MB - Last synced at: 2 days ago - Pushed at: 3 months ago - Stars: 1,367 - Forks: 96

ElmerCSC/elmerfem
Official git repository of Elmer FEM software
Language: Fortran - Size: 120 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 1,318 - Forks: 340

beehive-lab/TornadoVM
TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
Language: Java - Size: 152 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,251 - Forks: 119

mmstick/parallel 📦
This project now lives on in a rewrite at https://gitlab.redox-os.org/redox-os/parallel
Language: Rust - Size: 411 KB - Last synced at: about 19 hours ago - Pushed at: over 7 years ago - Stars: 1,199 - Forks: 31

python-adaptive/adaptive
:chart_with_upwards_trend: Adaptive: parallel active learning of mathematical functions
Language: Python - Size: 5.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,190 - Forks: 60

KratosMultiphysics/Kratos
Kratos Multiphysics (A.K.A Kratos) is a framework for building parallel multi-disciplinary simulation software. Modularity, extensibility and HPC are the main objectives. Kratos has BSD license and is written in C++ with extensive Python interface.
Language: C++ - Size: 2.01 GB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,117 - Forks: 261

inducer/pyopencl
OpenCL integration for Python, plus shiny features
Language: Python - Size: 5.69 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,104 - Forks: 245

AppiumTestDistribution/AppiumTestDistribution
A tool for running android and iOS appium tests in parallel across devices... U like it STAR it !
Language: Java - Size: 110 MB - Last synced at: 17 days ago - Pushed at: 6 months ago - Stars: 1,024 - Forks: 366

gunrock/gunrock
Programmable CUDA/C++ GPU Graph Analytics
Language: C++ - Size: 74.6 MB - Last synced at: 17 days ago - Pushed at: 10 months ago - Stars: 1,023 - Forks: 206

futureverse/future
:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
Language: R - Size: 14.2 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 984 - Forks: 89

OSGeo/grass
GRASS - free and open-source geospatial processing engine
Language: C - Size: 351 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 940 - Forks: 357

AccelerateHS/accelerate
Embedded language for high-performance array computations
Language: Haskell - Size: 15.4 MB - Last synced at: 4 days ago - Pushed at: 11 days ago - Stars: 920 - Forks: 123

FEniCS/dolfinx
Next generation FEniCS problem solving environment
Language: C++ - Size: 64.3 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 902 - Forks: 201

esa/pagmo2
A C++ platform to perform parallel computations of optimisation tasks (global and local) via the asynchronous generalized island model.
Language: C++ - Size: 58.3 MB - Last synced at: 4 days ago - Pushed at: about 2 months ago - Stars: 874 - Forks: 166

mpi4py/mpi4py
Python bindings for MPI
Language: Python - Size: 9.42 MB - Last synced at: about 11 hours ago - Pushed at: 8 days ago - Stars: 854 - Forks: 125

taskflow/awesome-parallel-computing
A curated list of awesome parallel computing resources
Size: 3.41 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 722 - Forks: 68

ConorWilliams/libfork
A bleeding-edge, lock-free, wait-free, continuation-stealing tasking library built on C++20's coroutines
Language: C++ - Size: 6.56 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 704 - Forks: 30

uxlfoundation/oneMath
oneAPI Math Library (oneMath)
Language: C++ - Size: 11.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 683 - Forks: 169

IntelPython/sdc 📦
Numba extension for compiling Pandas data frames, Intel® Scalable Dataframe Compiler
Language: Python - Size: 15.8 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 644 - Forks: 64

LLNL/sundials
Official development repository for SUNDIALS - a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers. Pull requests are welcome for bug fixes and minor changes.
Language: C - Size: 245 MB - Last synced at: about 21 hours ago - Pushed at: about 22 hours ago - Stars: 576 - Forks: 150

nwchemgit/nwchem
NWChem: Open Source High-Performance Computational Chemistry
Language: Fortran - Size: 341 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 542 - Forks: 172

OpenTimer/OpenTimer
A High-performance Timing Analysis Tool for VLSI Systems
Language: Verilog - Size: 329 MB - Last synced at: 10 months ago - Pushed at: about 2 years ago - Stars: 538 - Forks: 146

SimonBlanke/Hyperactive
An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.
Language: Python - Size: 30.6 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 522 - Forks: 49

LLNL/RAJA
RAJA Performance Portability Layer (C++)
Language: C++ - Size: 39.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 519 - Forks: 104

alesgenova/post-me
📩 Use web Workers and other Windows through a simple Promise API
Language: TypeScript - Size: 801 KB - Last synced at: 16 days ago - Pushed at: over 4 years ago - Stars: 514 - Forks: 13

01alchemist/TurboScript 📦
Super charged typed JavaScript dialect for parallel programming which compiles to WebAssembly
Language: JavaScript - Size: 13.2 MB - Last synced at: 27 days ago - Pushed at: almost 8 years ago - Stars: 497 - Forks: 35

NGSolve/ngsolve
Netgen/NGSolve is a high performance multiphysics finite element software. It is widely used to analyze models from solid mechanics, fluid dynamics and electromagnetics. Due to its flexible Python interface new physical equations and solution algorithms can be implemented easily.
Language: C++ - Size: 56.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 482 - Forks: 85

constellation-rs/amadeus
Harmonious distributed data analysis in Rust.
Language: Rust - Size: 2.46 MB - Last synced at: 10 days ago - Pushed at: almost 4 years ago - Stars: 480 - Forks: 25

esa/pygmo2
A Python platform to perform parallel computations of optimisation tasks (global and local) via the asynchronous generalized island model.
Language: C++ - Size: 13.9 MB - Last synced at: 24 days ago - Pushed at: 10 months ago - Stars: 477 - Forks: 59

mpi4jax/mpi4jax
Zero-copy MPI communication of JAX arrays, for turbo-charged HPC applications in Python :zap:
Language: Python - Size: 5.06 MB - Last synced at: 8 days ago - Pushed at: 3 months ago - Stars: 476 - Forks: 31

smistad/FAST
A framework for high-performance medical image processing, neural network inference and visualization
Language: C++ - Size: 19.5 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 469 - Forks: 107

mindspore-courses/step_into_llm
MindSpore online courses: Step into LLM
Language: Jupyter Notebook - Size: 246 MB - Last synced at: 10 days ago - Pushed at: 5 months ago - Stars: 468 - Forks: 118

luispedro/jug
Parallel programming with Python
Language: Python - Size: 2.33 MB - Last synced at: 18 days ago - Pushed at: 20 days ago - Stars: 453 - Forks: 62

ARM-software/mango
Parallel Hyperparameter Tuning in Python
Language: Jupyter Notebook - Size: 54.6 MB - Last synced at: 11 days ago - Pushed at: 4 months ago - Stars: 416 - Forks: 48

BitairLabs/concurrent.js
Non-blocking Concurrent Computation for JavaScript RTEs (Web Browsers, Node.js, Deno & Bun)
Language: TypeScript - Size: 283 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 388 - Forks: 6

lehins/massiv
Efficient Haskell Arrays featuring Parallel computation
Language: Haskell - Size: 6.67 MB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 386 - Forks: 25

ChunelFeng/CThreadPool
【A simple used C++ threadpool】一个简单好用,性能优异的,跨平台的C++线程池。欢迎 star & fork
Language: C++ - Size: 157 KB - Last synced at: 17 days ago - Pushed at: about 2 months ago - Stars: 381 - Forks: 73

SmileiPIC/Smilei
Particle-in-cell code for plasma simulation
Language: C++ - Size: 117 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 378 - Forks: 126

GraphIt-DSL/graphit
GraphIt - A High-Performance Domain Specific Language for Graph Analytics
Language: C++ - Size: 8.48 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 377 - Forks: 46

pipefunc/pipefunc
Lightweight fast function pipeline (DAG) creation in pure Python for scientific workflows 🕸️🧪
Language: Python - Size: 2.09 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 370 - Forks: 15

cmuparlay/parlaylib
A Toolkit for Programming Parallel Algorithms on Shared-Memory Multicore Machines
Language: C++ - Size: 1.27 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 363 - Forks: 68

kysucix/gipuma
Massively Parallel Multiview Stereopsis by Surface Normal Diffusion
Language: C++ - Size: 144 KB - Last synced at: 12 months ago - Pushed at: almost 3 years ago - Stars: 349 - Forks: 104

tirthajyoti/Spark-with-Python
Fundamentals of Spark with Python (using PySpark), code examples
Language: Jupyter Notebook - Size: 8.97 MB - Last synced at: 15 days ago - Pushed at: over 2 years ago - Stars: 347 - Forks: 271

dionhaefner/pyhpc-benchmarks
A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python :rocket:
Language: Python - Size: 1.19 MB - Last synced at: 14 days ago - Pushed at: 8 months ago - Stars: 325 - Forks: 25

mtmucha/coros
An easy-to-use and fast library for task-based parallelism, utilizing coroutines.
Language: C++ - Size: 724 KB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 322 - Forks: 6

feelpp/feelpp
:gem: Feel++: Finite Element Embedded Language and Library in C++
Language: C++ - Size: 348 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 320 - Forks: 66

pothosware/PothosCore
The Pothos data-flow framework
Language: C++ - Size: 63 MB - Last synced at: 14 days ago - Pushed at: about 2 years ago - Stars: 314 - Forks: 50

taskflow/work-stealing-queue
A fast work-stealing queue template in C++
Language: C++ - Size: 1010 KB - Last synced at: 14 days ago - Pushed at: over 1 year ago - Stars: 306 - Forks: 39

aregtech/areg-sdk
AREG is a cross-platform asynchronous Object RPC framework to simplify multitasking programming by blurring borders between processes and treating remote objects as if they coexist in the same thread.
Language: C++ - Size: 22.8 MB - Last synced at: 3 days ago - Pushed at: 5 days ago - Stars: 295 - Forks: 124

XiaoSong9905/CUDA-Optimization-Guide
Xiao's CUDA Optimization Guide [Active Adding New Contents]
Size: 36.4 MB - Last synced at: 25 days ago - Pushed at: over 2 years ago - Stars: 295 - Forks: 20

IntelLabs/ParallelAccelerator.jl 📦
The ParallelAccelerator package, part of the High Performance Scripting project at Intel Labs
Language: Julia - Size: 45.2 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 294 - Forks: 32

zero-one-group/geni
A Clojure dataframe library that runs on Spark
Language: Clojure - Size: 1.86 MB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 293 - Forks: 27

chengzeyi/ParaAttention
https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching
Language: Python - Size: 13.4 MB - Last synced at: 7 days ago - Pushed at: 27 days ago - Stars: 290 - Forks: 27

niedakh/pqdm
Comfortable parallel TQDM using concurrent.futures
Language: Python - Size: 86.9 KB - Last synced at: 16 days ago - Pushed at: 6 months ago - Stars: 289 - Forks: 9

optimagic-dev/optimagic
optimagic is a Python package for numerical optimization. It is a unified interface to optimizers from SciPy, NlOpt and other packages. optimagic's minimize function works just like SciPy's, so you don't have to adjust your code. You simply get more optimizers for free. On top you get diagnostic tools, parallel numerical derivatives and more.
Language: Python - Size: 28 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 287 - Forks: 44

BY571/Soft-Actor-Critic-and-Extensions
PyTorch implementation of Soft-Actor-Critic and Prioritized Experience Replay (PER) + Emphasizing Recent Experience (ERE) + Munchausen RL + D2RL and parallel Environments.
Language: Python - Size: 5.99 MB - Last synced at: 6 months ago - Pushed at: over 4 years ago - Stars: 272 - Forks: 32

Trinkle23897/Fast-Poisson-Image-Editing
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.
Language: Python - Size: 2.88 MB - Last synced at: 15 days ago - Pushed at: over 2 years ago - Stars: 269 - Forks: 15

bodo-ai/Bodo
High-Performance Python Compute Engine for Data and AI
Language: Python - Size: 706 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 268 - Forks: 12

pgiri/dispy
Distributed and Parallel Computing Framework with / for Python
Language: Python - Size: 3.76 MB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 266 - Forks: 54

owensgroup/RXMesh
GPU-accelerated triangle mesh processing https://ahdhn.github.io/RXMeshDocs
Language: Cuda - Size: 10.9 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 260 - Forks: 35

sourceryinstitute/OpenCoarrays
A parallel application binary interface for Fortran 2018 compilers.
Language: Fortran - Size: 8.51 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 255 - Forks: 55

mfem/PyMFEM
Python wrapper for MFEM
Language: SWIG - Size: 25.9 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 253 - Forks: 64

r-lib/mirai
Minimalist Async Evaluation Framework for R
Language: R - Size: 12.1 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 243 - Forks: 11

agenium-scale/boost.simd
Boost SIMD
Size: 192 KB - Last synced at: 2 months ago - Pushed at: about 6 years ago - Stars: 232 - Forks: 48

vincentjzy/OpenCorr
Digital Image Correlation & Digital Volume Correlation Library
Language: C++ - Size: 352 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 231 - Forks: 57

LLNL/SAMRAI
Structured Adaptive Mesh Refinement Application Infrastructure - a scalable C++ framework for block-structured AMR application development
Language: C++ - Size: 74.3 MB - Last synced at: 17 days ago - Pushed at: 2 months ago - Stars: 228 - Forks: 83

LLNL/libROM
Data-driven model reduction library with an emphasis on large scale parallelism and linear subspace methods
Language: C++ - Size: 54.4 MB - Last synced at: 20 days ago - Pushed at: 3 months ago - Stars: 220 - Forks: 38

bh107/bohrium
Automatic parallelization of Python/NumPy, C, and C++ codes on Linux and MacOSX
Language: C++ - Size: 32.4 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 220 - Forks: 31

futureverse/future.apply
:rocket: R package: future.apply - Apply Function to Elements in Parallel using Futures
Language: R - Size: 2.1 MB - Last synced at: 4 days ago - Pushed at: 13 days ago - Stars: 216 - Forks: 18

charmplusplus/charm
The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Language: C++ - Size: 194 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 214 - Forks: 53

BWbwchen/MapReduce
An easy-to-use Map Reduce Go parallel-computing framework inspired by 2021 6.824 lab1. It supports multiple workers threads on a single machine and multiple processes on a single machine right now.
Language: Go - Size: 2.6 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 214 - Forks: 13

Alpine-DAV/ascent
A flyweight in situ visualization and analysis runtime for multi-physics HPC simulations
Language: C++ - Size: 169 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 210 - Forks: 67

DLR-AMR/t8code
Parallel algorithms and data structures for tree-based adaptive mesh refinement (AMR) with arbitrary element shapes.
Language: C++ - Size: 124 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 207 - Forks: 56

privefl/bigsnpr
R package for the analysis of massive SNP arrays.
Language: R - Size: 107 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 206 - Forks: 46

bueler/p4pdes
C and Python examples from my book on using PETSc and Firedrake to solve PDEs
Language: C - Size: 4.48 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 201 - Forks: 74

grailbio/bigmachine
Bigmachine is a library for self-managing serverless computing in Go
Language: Go - Size: 635 KB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 201 - Forks: 20

xupsh/pp4fpgas-cn-hls
HLS Project of pp4fpgas - https://github.com/xupsh/pp4fpgas-cn
Language: Jupyter Notebook - Size: 58.4 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 194 - Forks: 73

SCOREC/core
parallel finite element unstructured meshes
Language: C++ - Size: 10.9 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 189 - Forks: 65

krABMaga/krABMaga
krABMaga: A modern developing art for reliable and efficient Agent-based Model (ABM) simulation with the Rust language
Language: Rust - Size: 63.6 MB - Last synced at: 15 days ago - Pushed at: 11 months ago - Stars: 188 - Forks: 12

dkeras-project/dkeras
Distributed Keras Engine, Make Keras faster with only one line of code.
Language: Python - Size: 6.48 MB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 188 - Forks: 12

siemens/embb
Embedded Multicore Building Blocks (EMB²): Library for parallel programming of embedded systems. Star us on GitHub? +1
Language: C++ - Size: 18.9 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 187 - Forks: 41

hxz393/BrutalityExtractor
适用于高性能系统的多进程解压缩软件(A multiprocess decompression software for high-performance system)
Language: Python - Size: 4.91 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 181 - Forks: 12

mlr-org/batchtools
Tools for computation on batch systems
Language: R - Size: 5.69 MB - Last synced at: 5 days ago - Pushed at: 17 days ago - Stars: 180 - Forks: 51

JohannesBuchner/UltraNest
Fit and compare complex models reliably and rapidly. Advanced nested sampling.
Language: Python - Size: 167 MB - Last synced at: 4 days ago - Pushed at: 18 days ago - Stars: 180 - Forks: 31
