Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: avx512
simdjson/simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
Language: C++ - Size: 56 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 18,486 - Forks: 968
bgin/Radar-ElectroOptical-Simulation
(REOS) Radar and Electro-Optical Simulation Framework written in C++.
Language: C++ - Size: 28.3 MB - Last synced: about 3 hours ago - Pushed: about 15 hours ago - Stars: 51 - Forks: 16
Dioarya/mandelbrotset-image-generator
Rewrite of a personal project from back in December 2023.
Language: C++ - Size: 328 KB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 0 - Forks: 0
ashvardanian/SimSIMD
Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, and C, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐
Language: C - Size: 685 KB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 727 - Forks: 35
kfrlib/kfr
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
Language: C++ - Size: 12 MB - Last synced: 4 days ago - Pushed: 5 days ago - Stars: 1,596 - Forks: 246
intel/x86-simd-sort
C++ template library for high performance SIMD based sorting algorithms
Language: C++ - Size: 1000 KB - Last synced: 6 days ago - Pushed: 6 days ago - Stars: 798 - Forks: 47
lssfau/ExaStencils
Mirror of the official ExaStencils Project repository. Please open pull requests on GitLab: https://i10git.cs.fau.de/exastencils/exastencils
Language: Scala - Size: 299 MB - Last synced: 6 days ago - Pushed: 6 days ago - Stars: 3 - Forks: 1
HugeONotation/AVEL
Another Vector Extensions Library
Language: C++ - Size: 1.21 MB - Last synced: 8 days ago - Pushed: 9 days ago - Stars: 0 - Forks: 0
ermig1979/Simd
C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM.
Language: C++ - Size: 38.3 MB - Last synced: 6 days ago - Pushed: 9 days ago - Stars: 1,977 - Forks: 403
RRZE-HPC/OSACA
Open Source Architecture Code Analyzer
Language: Jupyter Notebook - Size: 8.19 MB - Last synced: 8 days ago - Pushed: 9 days ago - Stars: 274 - Forks: 15
libxsmm/libxsmm
Library for specialized dense and sparse matrix operations, and deep learning primitives.
Language: C - Size: 297 MB - Last synced: 25 days ago - Pushed: 26 days ago - Stars: 795 - Forks: 181
HJLebbink/asm-dude
Visual Studio extension for assembly syntax highlighting and code completion in assembly files and the disassembly window
Language: C# - Size: 80.2 MB - Last synced: 10 days ago - Pushed: about 1 month ago - Stars: 4,104 - Forks: 94
VcDevel/Vc
SIMD Vector Classes for C++
Language: C++ - Size: 11 MB - Last synced: 10 days ago - Pushed: 3 months ago - Stars: 1,420 - Forks: 150
shibatch/sleef
SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT
Language: C - Size: 5.08 MB - Last synced: 11 days ago - Pushed: 16 days ago - Stars: 590 - Forks: 120
quasilyte/xedmap
Mappings between XED names and terms to other widespread forms.
Language: Go - Size: 1.95 KB - Last synced: 12 days ago - Pushed: almost 6 years ago - Stars: 1 - Forks: 0
quasilyte/avx512test
Utility that was used to generate initial Go AVX-512 encoder test suite.
Language: Assembly - Size: 1.46 MB - Last synced: 12 days ago - Pushed: about 5 years ago - Stars: 9 - Forks: 0
andyD123/DR3
DR3 enables users to write vectorised code using generic lambdas and filters. Switch instruction set just by changing enclosing namespace
Language: C++ - Size: 19.2 MB - Last synced: 14 days ago - Pushed: 14 days ago - Stars: 25 - Forks: 5
Auburn/FastSIMD
Low level generic SIMD wrapper for x86, ARM, WASM with dynamic dispatch
Language: C++ - Size: 228 KB - Last synced: 11 days ago - Pushed: 15 days ago - Stars: 23 - Forks: 2
matthewkolbe/LitMath
A collection of SIMD (AVX2 & AVX512) accelerated mathematical functions for .NET
Language: C# - Size: 175 KB - Last synced: 15 days ago - Pushed: about 1 month ago - Stars: 44 - Forks: 2
manodeep/Corrfunc
⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
Language: C - Size: 150 MB - Last synced: 7 days ago - Pushed: about 1 month ago - Stars: 162 - Forks: 49
intel/yask
YASK--Yet Another Stencil Kit: a domain-specific language and framework to create high-performance stencil code for implementing finite-difference methods and similar applications.
Language: C++ - Size: 28.8 MB - Last synced: 28 days ago - Pushed: about 1 month ago - Stars: 102 - Forks: 34
kimwalisch/primesieve
🚀 Fast prime number generator
Language: C++ - Size: 18.4 MB - Last synced: 9 days ago - Pushed: 24 days ago - Stars: 900 - Forks: 117
simd-everywhere/simde
Implementations of SIMD instruction sets for systems which don't natively support them.
Language: C - Size: 35 MB - Last synced: 19 days ago - Pushed: 21 days ago - Stars: 2,168 - Forks: 225
IAKOBVS/jstring
C String Library
Language: C - Size: 4.88 MB - Last synced: 17 days ago - Pushed: 17 days ago - Stars: 1 - Forks: 0
VectorChief/QuadRay-engine
Realtime raytracer using SIMD on ARM, MIPS, PPC and x86
Language: C - Size: 14.6 MB - Last synced: 18 days ago - Pushed: 18 days ago - Stars: 25 - Forks: 4
VectorChief/UniSIMD-assembler
SIMD macro assembler unified for ARM, MIPS, PPC and x86
Language: C - Size: 9.11 MB - Last synced: 18 days ago - Pushed: 18 days ago - Stars: 85 - Forks: 7
WojciechMula/toys
Storage for my snippets, toy programs, etc.
Language: C++ - Size: 2.34 MB - Last synced: 18 days ago - Pushed: 18 days ago - Stars: 311 - Forks: 37
kimwalisch/primecount
🚀 Fast prime counting function implementations
Language: C++ - Size: 7.74 MB - Last synced: 9 days ago - Pushed: 24 days ago - Stars: 302 - Forks: 40
JohT/convolution-benchmarks
Benchmark convolution implementations in C++ with Catch2 visualized with Vega-Lite
Language: C++ - Size: 5.94 MB - Last synced: 26 days ago - Pushed: 27 days ago - Stars: 1 - Forks: 1
kimwalisch/libpopcnt
🚀 Fast C/C++ bit population count library
Language: C - Size: 170 KB - Last synced: 10 days ago - Pushed: about 2 months ago - Stars: 298 - Forks: 36
intel/qpl
Intel® Query Processing Library (Intel® QPL)
Language: C - Size: 29.6 MB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 81 - Forks: 18
WojciechMula/base64-avx512
Code for paper "Base64 encoding and decoding at almost the speed of a memory copy"
Language: C - Size: 7.59 MB - Last synced: 1 day ago - Pushed: over 4 years ago - Stars: 196 - Forks: 7
cdl-saarland/rv
RV: A Unified Region Vectorizer for LLVM
Language: C++ - Size: 8.36 MB - Last synced: 20 days ago - Pushed: 30 days ago - Stars: 94 - Forks: 13
oneapi-src/oneDNN
oneAPI Deep Neural Network Library (oneDNN)
Language: C++ - Size: 163 MB - Last synced: 28 days ago - Pushed: 29 days ago - Stars: 3,442 - Forks: 949
p12tic/libsimdpp
Portable header-only C++ low level SIMD library
Language: C++ - Size: 4.38 MB - Last synced: 26 days ago - Pushed: 5 months ago - Stars: 1,187 - Forks: 132
xtensor-stack/xsimd
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
Language: C++ - Size: 3.64 MB - Last synced: 29 days ago - Pushed: about 1 month ago - Stars: 2,018 - Forks: 245
google/highway
Performance-portable, length-agnostic SIMD with runtime dispatch
Language: C++ - Size: 22.5 MB - Last synced: 30 days ago - Pushed: 30 days ago - Stars: 3,609 - Forks: 291
jvdd/argminmax
Efficient argmin & argmax
Language: Rust - Size: 536 KB - Last synced: about 8 hours ago - Pushed: about 1 month ago - Stars: 52 - Forks: 5
minio/md5-simd
Accelerate aggregated MD5 hashing performance up to 8x for AVX512 and 4x for AVX2. Useful for server applications that need to compute many MD5 sums in parallel.
Language: Go - Size: 698 KB - Last synced: 27 days ago - Pushed: over 1 year ago - Stars: 161 - Forks: 18
intel/DML
Intel® Data Mover Library (Intel® DML)
Language: C++ - Size: 9.71 MB - Last synced: 28 days ago - Pushed: about 1 month ago - Stars: 73 - Forks: 17
omarathon/compression-geospatial
Fast In-Memory Geospatial Data Compression.
Language: C++ - Size: 991 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
pre-eth/adam
ADAM is an actively developed CSPRNG inspired by ISAAC64
Language: C - Size: 889 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 15 - Forks: 0
Avereniect/AVEL
AVEL: Another Vector Extensions Library
Language: C++ - Size: 2.27 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 3 - Forks: 0
bluescarni/rakau
C++17 N-body Barnes-Hut on heterogeneous hardware architectures
Language: C++ - Size: 1.26 MB - Last synced: 10 days ago - Pushed: almost 4 years ago - Stars: 20 - Forks: 5
WojciechMula/sse-popcount
SIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html
Language: C++ - Size: 299 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 305 - Forks: 47
WojciechMula/base64simd
Base64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)
Language: C++ - Size: 401 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 147 - Forks: 13
cristian-bicheru/detect-simd
Python library to detect CPU SIMD capabilities.
Language: C - Size: 31.3 KB - Last synced: 7 days ago - Pushed: about 3 years ago - Stars: 3 - Forks: 0
jonicho/simd-radix-sort
A generic and efficient SIMD implementation of MSB Radix Sort with separate key and payload datastreams that supports arbitrary key and payload data types written in C++ accompanied by a bachelor's thesis.
Language: C++ - Size: 992 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 4 - Forks: 0
powturbo/Turbo-Base64
Turbo Base64 - Fastest Base64 SIMD:SSE/AVX2/AVX512/Neon/Altivec - Faster than memcpy!
Language: C - Size: 439 KB - Last synced: 2 months ago - Pushed: 9 months ago - Stars: 245 - Forks: 36
SnellerInc/sneller
World's fastest log analysis: λ + SQL + JSON + S3
Language: Go - Size: 23.9 MB - Last synced: 3 months ago - Pushed: 4 months ago - Stars: 956 - Forks: 39
nomonosound/numpy-minmax
A fast function for finding the minimum and maximum value in a NumPy array
Language: Python - Size: 54.7 KB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 9 - Forks: 0
minio/sha256-simd
Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core). SHA Extensions give a performance boost of close to 4x over native.
Language: Go - Size: 171 KB - Last synced: 3 months ago - Pushed: 12 months ago - Stars: 919 - Forks: 118
RickWong/go-aoc
Advent of Code in Go. First make it work, then right, then fast, then simple. Going for all puzzles < 1s.
Language: Go - Size: 298 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
dzaima/intrinsics-viewer
x86-64, ARM, and RVV intrinsics viewer
Language: JavaScript - Size: 727 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 16 - Forks: 1
WojciechMula/parsing-int-series
Parse multiple decimal integers separated by arbitrary number of delimiters
Language: C++ - Size: 280 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 28 - Forks: 5
dot-asm/cryptogams
CRYPTOGAMS distribution repository
Language: Assembly - Size: 873 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 54 - Forks: 20
ckswjd99-at-snu/SHPC-2023-2
SNU CSE Scalable High Performance Computing (M1522.006700) - 2023 Autumn
Language: C - Size: 41.3 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0
alainesp/simd-function
Python library to metaprogram C/C++ functions using SIMD instruction sets
Size: 145 KB - Last synced: about 1 month ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
morian/leek
SSE/AVX2/AVX512 onion v2 address generator.
Language: C - Size: 215 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 4 - Forks: 0
stuarthayhurst/battleships
Battleships opponent and compute experiments, with AVX2 / AVX-512
Language: Python - Size: 86.9 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
yzhaiustc/Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
Language: C - Size: 3.33 MB - Last synced: 5 months ago - Pushed: over 2 years ago - Stars: 65 - Forks: 16
terrorgarten/AVS_P1
Mandelbrot set calculation using OpenMP vectorization. School project. Tested on barbora.it4i.cz, batch calculator needs a fix.
Language: C++ - Size: 1.68 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 0 - Forks: 0
WojciechMula/sse4-strstr
SIMD (SWAR/SSE/SSE4/AVX2/AVX512F/ARM Neon) of Karp-Rabin algorithm's modification
Language: C++ - Size: 112 KB - Last synced: 6 months ago - Pushed: over 2 years ago - Stars: 216 - Forks: 27
Steppenwolfe65/CEX
The CEX Cryptographic library in C++
Language: HTML - Size: 3.42 GB - Last synced: 7 months ago - Pushed: about 1 year ago - Stars: 55 - Forks: 25
altimesh/hybridizer-basic-samples
Examples of C# code compiled to GPU by hybridizer
Language: C# - Size: 81 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 220 - Forks: 34
nicholasferguson/Portable_SIMD
testing an SIMD api from VecCore VecGeom, using backends of UMESIMD, VC for Avx Avx2,AVX512, SSE, SSE2
Language: C++ - Size: 1.22 MB - Last synced: 7 months ago - Pushed: over 6 years ago - Stars: 2 - Forks: 0
shibatch/cpuburnavx2
Programs for producing as much heat as possible with AVX2 or AVX512 instructions
Language: C - Size: 1000 Bytes - Last synced: 7 months ago - Pushed: over 3 years ago - Stars: 3 - Forks: 0
WojciechMula/ternary-logic
Support for ternary logic in SSE, XOP, AVX2 and x86 programs
Language: C++ - Size: 132 KB - Last synced: 7 months ago - Pushed: over 2 years ago - Stars: 24 - Forks: 9
VcDevel/std-simd
std::experimental::simd for GCC [ISO/IEC TS 19570:2018]
Language: C++ - Size: 3.34 MB - Last synced: 7 months ago - Pushed: about 1 year ago - Stars: 510 - Forks: 38
YuriMyakotin/ChaCha20-SIMD
ChaCha20 C SIMD implementations - AVX512, AVX2, SSE2
Language: C - Size: 18.6 KB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0
edanor/umesimd
UME::SIMD A library for explicit simd vectorization.
Language: C++ - Size: 5.89 MB - Last synced: 7 months ago - Pushed: over 6 years ago - Stars: 84 - Forks: 18
agenium-scale/boost.simd
Boost SIMD
Size: 192 KB - Last synced: 7 months ago - Pushed: about 5 years ago - Stars: 233 - Forks: 50
rainerzufalldererste/hypersonic-rANS
Some of the fastest decoding range-based Asymetric Numeral Systems (rANS) codecs for x64
Language: C++ - Size: 2.93 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 7 - Forks: 1
zbjornson/node-bswap
Fast byte swapping for Node.js and browsers
Language: JavaScript - Size: 106 KB - Last synced: 24 days ago - Pushed: 8 months ago - Stars: 6 - Forks: 3
agenium-scale/nsimd
Agenium Scale vectorization library for CPUs and GPUs
Language: C - Size: 6.92 MB - Last synced: 7 months ago - Pushed: over 2 years ago - Stars: 303 - Forks: 31
dendisuhubdy/simple_vector_classes Fork of VcDevel/Vc
SIMD Vector Classes for C++
Language: C++ - Size: 9.15 MB - Last synced: 9 months ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0
vxst/qrand
High Quality Quick Random Number Generator that passes BigCrush suite
Language: C++ - Size: 8.79 KB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 11 - Forks: 1
marshallward/optiflop
Optiflop measures the optimally achievable FLOPs for mathematical operations on various platforms.
Language: C - Size: 599 KB - Last synced: 10 days ago - Pushed: about 1 month ago - Stars: 11 - Forks: 2
misharash/Corrfunc Fork of manodeep/Corrfunc
⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
Language: C - Size: 150 MB - Last synced: 9 months ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0
ntuhpc/sc17-mrbayes
MrBayes optimized with AVX512 and FMA
Language: C - Size: 2.16 MB - Last synced: 9 months ago - Pushed: over 6 years ago - Stars: 3 - Forks: 5
jviney/bilinear_filter_simd
Bilinear image filtering implemented with SSE4, AVX2 and AVX512.
Language: C++ - Size: 1.74 MB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 8 - Forks: 0
mklarqvist/positional-popcount
Fast C functions for the computing the positional popcount (pospopcnt).
Language: C - Size: 545 KB - Last synced: 11 months ago - Pushed: over 4 years ago - Stars: 50 - Forks: 5
xusworld/tars
Tars is a cool deep learning framework.
Language: C++ - Size: 151 KB - Last synced: 12 months ago - Pushed: 12 months ago - Stars: 2 - Forks: 0
WOnder93/argon2
A multi-arch library implementing the Argon2 password hashing algorithm.
Language: C - Size: 152 KB - Last synced: 12 months ago - Pushed: almost 3 years ago - Stars: 13 - Forks: 8
tugrul512bit/VectorizedKernel
Running GPGPU-like kernels on CPU with auto-vectorization for SSE/AVX/AVX512 SIMD Architectures
Language: C++ - Size: 241 KB - Last synced: almost 1 year ago - Pushed: almost 1 year ago - Stars: 5 - Forks: 0
HJLebbink/x86doc Fork of fay59/x86doc
HTML representation of the Intel x86 instructions documentation (June 2016).
Language: HTML - Size: 3.42 MB - Last synced: 12 months ago - Pushed: almost 7 years ago - Stars: 61 - Forks: 14
mklarqvist/libalgebra
Fast C header-only library for popcnt, pospopcnt, and set algebraic operations
Language: C - Size: 92.8 KB - Last synced: 11 months ago - Pushed: over 4 years ago - Stars: 38 - Forks: 7
mklarqvist/StormBitmaps
Fast algorithms for computing XX^T for binary matrices
Language: C - Size: 657 KB - Last synced: 11 months ago - Pushed: over 4 years ago - Stars: 13 - Forks: 2
WojciechMula/simd-byte-lookup
SIMDized check which bytes are in a set
Language: Python - Size: 23.4 KB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 23 - Forks: 2
WojciechMula/ternarylogiccli
CLI utilty to work out proper constants for vpternlogic instruction
Language: Python - Size: 4.88 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 8 - Forks: 0
toksaitov/ips-arch-project
ips-arch-project is a project for the Computer Architecture course at AUCA.
Language: C - Size: 24.4 KB - Last synced: 28 days ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 3
jeffhammond/vpu-count
Information about AVX-512 support on recent Intel processors
Language: C - Size: 63.5 KB - Last synced: about 1 year ago - Pushed: about 2 years ago - Stars: 32 - Forks: 2
lemonjesus/avx512-polyline
An implementation of Google's Encoded Polyline algorithm in AVX512 because why not. Perhaps the fastest and least portable polyline encoder out there?
Language: C - Size: 33.2 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
animetosho/md5-optimisation
The fastest MD5 implementation using x86 assembly
Language: C++ - Size: 375 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 70 - Forks: 7
matthewkolbe/ThinkingInSimd
An essay comparing performance implications of ignoring AVX acceleration
Language: C++ - Size: 394 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 2 - Forks: 0
zbjornson/fastcode
A list of fast libraries, primarily x86/64 C++ and Node.js C++ extensions
Size: 2.93 KB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 16 - Forks: 2
sandialabs/p3a
Portably Performant Physical Algebra
Language: C++ - Size: 645 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 9 - Forks: 5
MahdiSafsafi/UnivDisasm
x86 Disassembler and Analyzer
Language: Pascal - Size: 6.37 MB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 83 - Forks: 31
itzmeanjan/blake3
SYCL accelerated BLAKE3 Hash Implementation
Language: C++ - Size: 104 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 10 - Forks: 2
PatwinchIR/ultra-sort
DSL for SIMD Sorting on AVX2 & AVX512
Language: C++ - Size: 6.43 MB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 30 - Forks: 2