Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: avx-512
nidud/asmc
Masm compatible assembler
Language: Assembly - Size: 67.9 MB - Last synced: about 4 hours ago - Pushed: 1 day ago - Stars: 12 - Forks: 4
simdutf/is_utf8
Fast C++ function "is_utf8": checks if the input is valid UTF-8. Made of a single source file. Optimized for ARM NEON, x64 SSE, AVX2 and AVX-512.
Language: C++ - Size: 134 KB - Last synced: about 6 hours ago - Pushed: 1 day ago - Stars: 44 - Forks: 5
simdutf/simdutf
Unicode routines (UTF8, UTF16, UTF32) and Base64: billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension. Part of Node.js and Bun.
Language: C++ - Size: 5.22 MB - Last synced: about 21 hours ago - Pushed: 4 days ago - Stars: 965 - Forks: 60
RoaringBitmap/CRoaring
Roaring bitmaps in C (and C++), with SIMD (AVX2, AVX-512 and NEON) optimizations: used by Apache Doris, ClickHouse, and StarRocks
Language: C - Size: 47 MB - Last synced: about 21 hours ago - Pushed: 7 days ago - Stars: 1,460 - Forks: 259
aff3ct/MIPP
MIPP is a portable wrapper for SIMD instructions written in C++11. It supports NEON, SSE, AVX, AVX-512 and SVE (length specific).
Language: C++ - Size: 2.01 MB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 463 - Forks: 86
quasilyte/avx512test
Utility that was used to generate initial Go AVX-512 encoder test suite.
Language: Assembly - Size: 1.46 MB - Last synced: 11 days ago - Pushed: about 5 years ago - Stars: 9 - Forks: 0
SESAME-Synchrotron/orbit-feedback
Design of the Fast-Orbit Feedback correction for SESAME's accelerator
Language: C - Size: 62.5 KB - Last synced: 21 days ago - Pushed: 6 months ago - Stars: 0 - Forks: 0
google/highway
Performance-portable, length-agnostic SIMD with runtime dispatch
Language: C++ - Size: 22.5 MB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 3,609 - Forks: 291
awesome-simd/awesome-simd
A curated list of awesome SIMD frameworks, libraries and software
Size: 38.1 KB - Last synced: about 1 month ago - Pushed: 3 months ago - Stars: 105 - Forks: 14
intel/hexl
Intel:registered: Homomorphic Encryption Acceleration Library accelerates modular arithmetic operations used in homomorphic encryption
Language: C++ - Size: 5.44 MB - Last synced: 27 days ago - Pushed: 8 months ago - Stars: 207 - Forks: 47
jonicho/simd-radix-sort
A generic and efficient SIMD implementation of MSB Radix Sort with separate key and payload datastreams that supports arbitrary key and payload data types written in C++ accompanied by a bachelor's thesis.
Language: C++ - Size: 992 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 4 - Forks: 0
antoinecarme/xeon-phi-data
Data for Intel Xeon-Phi server used in PyAF tests
Language: Python - Size: 43.8 MB - Last synced: about 13 hours ago - Pushed: over 3 years ago - Stars: 2 - Forks: 1
hubery-tao/fast_math
high-speed math functions based on AVX-512 intrinsics
Language: C++ - Size: 71.3 KB - Last synced: 3 months ago - Pushed: almost 2 years ago - Stars: 4 - Forks: 1
romz-pl/matrix-matrix-multiply
Algorithms for matrix matrix multiplication, dgemm, AVX-256, AVX-512
Language: C++ - Size: 55.7 KB - Last synced: 25 days ago - Pushed: almost 3 years ago - Stars: 10 - Forks: 2
swojtasiak/fcml-lib
A general purpose machine code manipulation library for x86-32 (IA-32) and x86-64 (AMD64) architectures (Assembler, Disassembler, Library).
Language: C - Size: 22.9 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 81 - Forks: 24
bgin/Radar_ElectroOptical_Simulation
(REOS) Radar and ElectroOptical Simulation Framework written in Fortran.
Language: Fortran - Size: 39.2 MB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 45 - Forks: 14
intel/document-level-sentiment-analysis
Document Level Sentiment Analysis is an End-to-End deep learning workflow using Hugging Face transformers API to do a "classification" task at document level, to analyze the sentiment of input document containing English sentences or paragraphs.
Language: Python - Size: 239 KB - Last synced: 27 days ago - Pushed: 5 months ago - Stars: 6 - Forks: 3
twest820/AVX-512
AVX-512 documentation beyond what Intel provides
Size: 1.27 MB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 28 - Forks: 3
pcineverdies/FFT-AVX-512 📦
Fast Fourier Transform implementation though x86 AVX-512 SIMD extension
Language: C++ - Size: 7.81 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 1 - Forks: 1
ammarfaizi2/memcpy_benchmark
Benchmark to show which is the fastest memcpy.
Language: Assembly - Size: 54.7 KB - Last synced: 10 days ago - Pushed: over 3 years ago - Stars: 10 - Forks: 2
rainerzufalldererste/hypersonic-rle-kit
The fastest Run-Length-Encoding on the Planet (for x64)
Language: C - Size: 1.69 MB - Last synced: 10 months ago - Pushed: 11 months ago - Stars: 18 - Forks: 4
falseywinchnet/tomatofft
The Tomato Patch FFT is the fastest FFT in the world- but it is by no means efficient.
Size: 8.53 MB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0
nsomatilda/Matilda
Matilda is a library to repeatedly multiply a constant matrix with a variable vector
Language: C++ - Size: 38.1 KB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0
tugrul512bit/VectorizedKernel
Running GPGPU-like kernels on CPU with auto-vectorization for SSE/AVX/AVX512 SIMD Architectures
Language: C++ - Size: 241 KB - Last synced: 12 months ago - Pushed: 12 months ago - Stars: 5 - Forks: 0
gfurtadoalmeida/study-assembly-x64
Projects and annotations used to learn x64 assembly
Language: C++ - Size: 194 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 2 - Forks: 0
lemonjesus/avx512-polyline
An implementation of Google's Encoded Polyline algorithm in AVX512 because why not. Perhaps the fastest and least portable polyline encoder out there?
Language: C - Size: 33.2 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
DmitryYurov/BitsCount
Count set bits in an integer
Language: C++ - Size: 30.3 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0
venovako/VecKog
The vectorized (AVX-512) batched singular value decomposition algorithm for matrices of order two.
Language: C - Size: 107 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0
MamarezaAlipour/AVX-Hole
AVX-Hole C++ SIMD Library
Language: C++ - Size: 133 KB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 15 - Forks: 0
harshapathuri86/parallel-codes
Language: C++ - Size: 2.47 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
zingaburga/SIMDflate
Experimental speed-oriented DEFLATE implementation, based on AVX-512
Language: C++ - Size: 130 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
kvr000/zbynek-cxx-exp
Zbynek's various C and C++ experiments
Language: C++ - Size: 40 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0
stefan-zobel/cramer
Some loose performance experiments with Agner Fog's VCL
Language: C++ - Size: 770 KB - Last synced: 26 days ago - Pushed: 27 days ago - Stars: 0 - Forks: 0
toshioendo/hoalgos
Implementation of Hierarchy Oblivious Algorithms
Language: C++ - Size: 731 KB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 3 - Forks: 0
PhuNH/hpc-lab
Scientific Computing - High-Performance Computing Practical Course in WS18-19 at TUM
Language: C++ - Size: 1.08 MB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0