Topic: "simd-parallelism"
google/highway
Performance-portable, length-agnostic SIMD with runtime dispatch
Language: C++ - Size: 26.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4,683 - Forks: 351

jfalcou/eve
Expressive Vector Engine - SIMD in C++ Goes Brrrr
Language: C++ - Size: 58.1 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 1,203 - Forks: 60

lilohuang/PyTurboJPEG
PyTurboJPEG is a highly optimized Python wrapper of libjpeg-turbo (TurboJPEG API) which supports x86 and ARM architecture.
Language: Python - Size: 119 KB - Last synced at: 19 days ago - Pushed at: 4 months ago - Stars: 274 - Forks: 43

zeam-vm/pelemay
Pelemay is a native compiler for Elixir, which generates SIMD instructions. It has a plan to generate for GPU code.
Language: Elixir - Size: 410 KB - Last synced at: 22 days ago - Pushed at: over 4 years ago - Stars: 187 - Forks: 13

gyrdym/ml_linalg
SIMD-based linear algebra and statistics for data science with dart
Language: Dart - Size: 1.1 MB - Last synced at: 6 days ago - Pushed at: 10 months ago - Stars: 84 - Forks: 9

PatwinchIR/ultra-sort
DSL for SIMD Sorting on AVX2 & AVX512
Language: C++ - Size: 6.43 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 30 - Forks: 2

Applied-Scientific-Research/Omega2D
Two-dimensional flow solver with GUI using vortex particle and boundary element methods
Language: C++ - Size: 16.8 MB - Last synced at: 11 months ago - Pushed at: about 2 years ago - Stars: 28 - Forks: 5

fzqneo/ByteSlice
"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)
Language: C++ - Size: 36.1 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 25 - Forks: 3

pleiszenburg/gravitation
n-body-simulation performance test suite
Language: Python - Size: 2.22 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 18 - Forks: 1

MarioSieg/Corium 📦
Corium is a modern scripting language which combines simple, safe and efficient programming.
Language: C++ - Size: 248 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 18 - Forks: 4

ZL-Su/Matrice
A portable modern C++ primitive performance library for 3D Vision & Photo-Mechanics.
Language: C++ - Size: 67.5 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 16 - Forks: 3

Applied-Scientific-Research/Omega3D
GPU-accelerated 3D vortex methods solver with easy GUI
Language: C++ - Size: 4.58 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 13 - Forks: 6

ms0g/vml
SIMD-accelerated Vector math lib
Language: Assembly - Size: 29.3 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 7 - Forks: 1

gregyjames/tsunami
A High Performance C# wrapper that allows you to get the benefits of SIMD Intrinsics on List<T>.
Language: C# - Size: 363 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

sahmad98/vstring
Vectroized String Helper Functions
Language: C++ - Size: 61.5 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 6 - Forks: 0

artem0/benchmarking
System benchmarks over JVM with JMH - SIMD (superscalar processing), Branch prediction, False sharing.
Language: Java - Size: 70.3 KB - Last synced at: 2 days ago - Pushed at: almost 7 years ago - Stars: 5 - Forks: 2

sfegan/dft_simd
SIMD discrete Fourier transform tests and discussion
Language: C++ - Size: 740 KB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 4 - Forks: 1

n-roussos/Parallel-Programming-with-OpenMP
This repository lists 4 problems solved using C. Each problem has its own serial and parallel implementations. For the latter, the OpenMP API was utilized.
Language: C - Size: 852 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 2

whtcorpsinc/einsteindb-prod
EinsteinDB is a Hybrid memory system consisting of DRAM and Non-Volatile Memory configured to persist data fast.
Language: Rust - Size: 10.3 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 1

jeffamstutz/psimd
(experiments with) pragma-based SIMD C++ types
Language: C++ - Size: 231 KB - Last synced at: 3 months ago - Pushed at: over 7 years ago - Stars: 3 - Forks: 0

UCL-ARC/cluster_club_accelerated_python Fork of tkoskela/hpc_lecture_notes
Materials for ARC's cluster club session on accelerating scientific python codes
Language: Jupyter Notebook - Size: 3.69 MB - Last synced at: 6 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 0

mtantaoui/simdly
🚀 High-performance Rust library leveraging SIMD and Rayon for fast computations.
Language: Rust - Size: 79.1 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

sunsided/mongo Fork of mongodb/mongo
The MongoDB database with SIMD-based dot_product aggregation on IEEE 754 single-precision vectors.
Language: C++ - Size: 326 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

kwanCCC/sorted-rs
check sequence is sorted or not but through SIMD
Language: Rust - Size: 17.6 KB - Last synced at: 11 days ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

tugrul512bit/InverseFX
Computing a function when only its inverse is known, using Newson-Raphson method for 1D,2D,3D arrays in parallel.
Language: C++ - Size: 85.9 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

frederik-hoeft/simd
A fast and simple c# hex-decode function using AVX2 and SSSE3 Intel intrinsics.
Language: C# - Size: 7.81 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

nahuelcastro/Digital-Image-Processing-SSE
Image filters using SSE Instructions (Streaming SIMD Extensions) of Intel® x86-64 Architecture.
Language: C - Size: 172 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 2

cuongvng/Optimizing-Convolution-with-NEON-Intrinsics
Optimizing convolution function using ARM's NEON Intrinsics
Language: C++ - Size: 135 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

ell-hol/simd-parallelized-haar-transform
8x speedup of 1D Haar-Transform using intel SIMD intrinsics
Language: C - Size: 116 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

astrogeekdk/RISC-V-Basic-SIMD
A basic implemention of 8 lane vector SIMD in RISC-V 5 Stage Pipeline, written in Chisel and Scala.
Language: Scala - Size: 12.7 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

alighanbari2002/Parallel-Programming-Course-Projects
Parallel Programming course projects demonstrating various parallelism techniques with SIMD SSE3, OMP, and POSIX threads, including Intel Parallel Studio for analysis and parallelization.
Language: C++ - Size: 7.32 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

t0re199/ARCHP_PROJECT
C & Assembly optimized version of the Stochastic Gradient Descent x SoftSVM x Polynomial Kernel Method algorithm
Language: C - Size: 32.2 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ramesh-adhikari/HPC
High Performance Computing exercises
Language: C - Size: 10.7 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

kavindaperera/Distributed-Memory-Programming-with-MPI
Examples of Distributed-Memory Programming with MPI
Language: C - Size: 7.81 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

BNandor/x86-Assembly-Julia-Fractal-SIMD
AVX SIMD accelerated Julia fractal explorer, 7 beautiful sets
Language: Assembly - Size: 3.4 MB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

WojciechMigda/TCO-CMap2
CMap2 Top Coder Data Science Marathon Match
Language: C++ - Size: 72.3 KB - Last synced at: about 1 year ago - Pushed at: over 8 years ago - Stars: 0 - Forks: 0
