Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: avx512

simdjson/simdjson

Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

Language: C++ - Size: 56 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 18,486 - Forks: 968

bgin/Radar-ElectroOptical-Simulation

(REOS) Radar and Electro-Optical Simulation Framework written in C++.

Language: C++ - Size: 28.3 MB - Last synced: about 3 hours ago - Pushed: about 15 hours ago - Stars: 51 - Forks: 16

Dioarya/mandelbrotset-image-generator

Rewrite of a personal project from back in December 2023.

Language: C++ - Size: 328 KB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 0 - Forks: 0

ashvardanian/SimSIMD

Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, and C, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐

Language: C - Size: 685 KB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 727 - Forks: 35

kfrlib/kfr

Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)

Language: C++ - Size: 12 MB - Last synced: 4 days ago - Pushed: 5 days ago - Stars: 1,596 - Forks: 246

intel/x86-simd-sort

C++ template library for high performance SIMD based sorting algorithms

Language: C++ - Size: 1000 KB - Last synced: 6 days ago - Pushed: 6 days ago - Stars: 798 - Forks: 47

lssfau/ExaStencils

Mirror of the official ExaStencils Project repository. Please open pull requests on GitLab: https://i10git.cs.fau.de/exastencils/exastencils

Language: Scala - Size: 299 MB - Last synced: 6 days ago - Pushed: 6 days ago - Stars: 3 - Forks: 1

HugeONotation/AVEL

Another Vector Extensions Library

Language: C++ - Size: 1.21 MB - Last synced: 8 days ago - Pushed: 9 days ago - Stars: 0 - Forks: 0

ermig1979/Simd

C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM.

Language: C++ - Size: 38.3 MB - Last synced: 6 days ago - Pushed: 9 days ago - Stars: 1,977 - Forks: 403

RRZE-HPC/OSACA

Open Source Architecture Code Analyzer

Language: Jupyter Notebook - Size: 8.19 MB - Last synced: 8 days ago - Pushed: 9 days ago - Stars: 274 - Forks: 15

libxsmm/libxsmm

Library for specialized dense and sparse matrix operations, and deep learning primitives.

Language: C - Size: 297 MB - Last synced: 25 days ago - Pushed: 26 days ago - Stars: 795 - Forks: 181

HJLebbink/asm-dude

Visual Studio extension for assembly syntax highlighting and code completion in assembly files and the disassembly window

Language: C# - Size: 80.2 MB - Last synced: 10 days ago - Pushed: about 1 month ago - Stars: 4,104 - Forks: 94

VcDevel/Vc

SIMD Vector Classes for C++

Language: C++ - Size: 11 MB - Last synced: 10 days ago - Pushed: 3 months ago - Stars: 1,420 - Forks: 150

shibatch/sleef

SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT

Language: C - Size: 5.08 MB - Last synced: 11 days ago - Pushed: 16 days ago - Stars: 590 - Forks: 120

quasilyte/xedmap

Mappings between XED names and terms to other widespread forms.

Language: Go - Size: 1.95 KB - Last synced: 12 days ago - Pushed: almost 6 years ago - Stars: 1 - Forks: 0

quasilyte/avx512test

Utility that was used to generate initial Go AVX-512 encoder test suite.

Language: Assembly - Size: 1.46 MB - Last synced: 12 days ago - Pushed: about 5 years ago - Stars: 9 - Forks: 0

andyD123/DR3

DR3 enables users to write vectorised code using generic lambdas and filters. Switch instruction set just by changing enclosing namespace

Language: C++ - Size: 19.2 MB - Last synced: 14 days ago - Pushed: 14 days ago - Stars: 25 - Forks: 5

Auburn/FastSIMD

Low level generic SIMD wrapper for x86, ARM, WASM with dynamic dispatch

Language: C++ - Size: 228 KB - Last synced: 11 days ago - Pushed: 15 days ago - Stars: 23 - Forks: 2

matthewkolbe/LitMath

A collection of SIMD (AVX2 & AVX512) accelerated mathematical functions for .NET

Language: C# - Size: 175 KB - Last synced: 15 days ago - Pushed: about 1 month ago - Stars: 44 - Forks: 2

manodeep/Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.

Language: C - Size: 150 MB - Last synced: 7 days ago - Pushed: about 1 month ago - Stars: 162 - Forks: 49

intel/yask

YASK--Yet Another Stencil Kit: a domain-specific language and framework to create high-performance stencil code for implementing finite-difference methods and similar applications.

Language: C++ - Size: 28.8 MB - Last synced: 28 days ago - Pushed: about 1 month ago - Stars: 102 - Forks: 34

kimwalisch/primesieve

🚀 Fast prime number generator

Language: C++ - Size: 18.4 MB - Last synced: 9 days ago - Pushed: 24 days ago - Stars: 900 - Forks: 117

simd-everywhere/simde

Implementations of SIMD instruction sets for systems which don't natively support them.

Language: C - Size: 35 MB - Last synced: 19 days ago - Pushed: 21 days ago - Stars: 2,168 - Forks: 225

IAKOBVS/jstring

C String Library

Language: C - Size: 4.88 MB - Last synced: 17 days ago - Pushed: 17 days ago - Stars: 1 - Forks: 0

VectorChief/QuadRay-engine

Realtime raytracer using SIMD on ARM, MIPS, PPC and x86

Language: C - Size: 14.6 MB - Last synced: 18 days ago - Pushed: 18 days ago - Stars: 25 - Forks: 4

VectorChief/UniSIMD-assembler

SIMD macro assembler unified for ARM, MIPS, PPC and x86

Language: C - Size: 9.11 MB - Last synced: 18 days ago - Pushed: 18 days ago - Stars: 85 - Forks: 7

WojciechMula/toys

Storage for my snippets, toy programs, etc.

Language: C++ - Size: 2.34 MB - Last synced: 18 days ago - Pushed: 18 days ago - Stars: 311 - Forks: 37

kimwalisch/primecount

🚀 Fast prime counting function implementations

Language: C++ - Size: 7.74 MB - Last synced: 9 days ago - Pushed: 24 days ago - Stars: 302 - Forks: 40

JohT/convolution-benchmarks

Benchmark convolution implementations in C++ with Catch2 visualized with Vega-Lite

Language: C++ - Size: 5.94 MB - Last synced: 26 days ago - Pushed: 27 days ago - Stars: 1 - Forks: 1

kimwalisch/libpopcnt

🚀 Fast C/C++ bit population count library

Language: C - Size: 170 KB - Last synced: 10 days ago - Pushed: about 2 months ago - Stars: 298 - Forks: 36

intel/qpl

Intel® Query Processing Library (Intel® QPL)

Language: C - Size: 29.6 MB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 81 - Forks: 18

WojciechMula/base64-avx512

Code for paper "Base64 encoding and decoding at almost the speed of a memory copy"

Language: C - Size: 7.59 MB - Last synced: 1 day ago - Pushed: over 4 years ago - Stars: 196 - Forks: 7

cdl-saarland/rv

RV: A Unified Region Vectorizer for LLVM

Language: C++ - Size: 8.36 MB - Last synced: 20 days ago - Pushed: 30 days ago - Stars: 94 - Forks: 13

oneapi-src/oneDNN

oneAPI Deep Neural Network Library (oneDNN)

Language: C++ - Size: 163 MB - Last synced: 28 days ago - Pushed: 29 days ago - Stars: 3,442 - Forks: 949

p12tic/libsimdpp

Portable header-only C++ low level SIMD library

Language: C++ - Size: 4.38 MB - Last synced: 26 days ago - Pushed: 5 months ago - Stars: 1,187 - Forks: 132

xtensor-stack/xsimd

C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))

Language: C++ - Size: 3.64 MB - Last synced: 29 days ago - Pushed: about 1 month ago - Stars: 2,018 - Forks: 245

google/highway

Performance-portable, length-agnostic SIMD with runtime dispatch

Language: C++ - Size: 22.5 MB - Last synced: 30 days ago - Pushed: 30 days ago - Stars: 3,609 - Forks: 291

jvdd/argminmax

Efficient argmin & argmax

Language: Rust - Size: 536 KB - Last synced: about 8 hours ago - Pushed: about 1 month ago - Stars: 52 - Forks: 5

minio/md5-simd

Accelerate aggregated MD5 hashing performance up to 8x for AVX512 and 4x for AVX2. Useful for server applications that need to compute many MD5 sums in parallel.

Language: Go - Size: 698 KB - Last synced: 27 days ago - Pushed: over 1 year ago - Stars: 161 - Forks: 18

intel/DML

Intel® Data Mover Library (Intel® DML)

Language: C++ - Size: 9.71 MB - Last synced: 28 days ago - Pushed: about 1 month ago - Stars: 73 - Forks: 17

omarathon/compression-geospatial

Fast In-Memory Geospatial Data Compression.

Language: C++ - Size: 991 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

pre-eth/adam

ADAM is an actively developed CSPRNG inspired by ISAAC64

Language: C - Size: 889 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 15 - Forks: 0

Avereniect/AVEL

AVEL: Another Vector Extensions Library

Language: C++ - Size: 2.27 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 3 - Forks: 0

bluescarni/rakau

C++17 N-body Barnes-Hut on heterogeneous hardware architectures

Language: C++ - Size: 1.26 MB - Last synced: 10 days ago - Pushed: almost 4 years ago - Stars: 20 - Forks: 5

WojciechMula/sse-popcount

SIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html

Language: C++ - Size: 299 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 305 - Forks: 47

WojciechMula/base64simd

Base64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)

Language: C++ - Size: 401 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 147 - Forks: 13

cristian-bicheru/detect-simd

Python library to detect CPU SIMD capabilities.

Language: C - Size: 31.3 KB - Last synced: 7 days ago - Pushed: about 3 years ago - Stars: 3 - Forks: 0

jonicho/simd-radix-sort

A generic and efficient SIMD implementation of MSB Radix Sort with separate key and payload datastreams that supports arbitrary key and payload data types written in C++ accompanied by a bachelor's thesis.

Language: C++ - Size: 992 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 4 - Forks: 0

powturbo/Turbo-Base64

Turbo Base64 - Fastest Base64 SIMD:SSE/AVX2/AVX512/Neon/Altivec - Faster than memcpy!

Language: C - Size: 439 KB - Last synced: 2 months ago - Pushed: 9 months ago - Stars: 245 - Forks: 36

SnellerInc/sneller

World's fastest log analysis: λ + SQL + JSON + S3

Language: Go - Size: 23.9 MB - Last synced: 3 months ago - Pushed: 4 months ago - Stars: 956 - Forks: 39

nomonosound/numpy-minmax

A fast function for finding the minimum and maximum value in a NumPy array

Language: Python - Size: 54.7 KB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 9 - Forks: 0

minio/sha256-simd

Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core). SHA Extensions give a performance boost of close to 4x over native.

Language: Go - Size: 171 KB - Last synced: 3 months ago - Pushed: 12 months ago - Stars: 919 - Forks: 118

RickWong/go-aoc

Advent of Code in Go. First make it work, then right, then fast, then simple. Going for all puzzles < 1s.

Language: Go - Size: 298 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

dzaima/intrinsics-viewer

x86-64, ARM, and RVV intrinsics viewer

Language: JavaScript - Size: 727 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 16 - Forks: 1

WojciechMula/parsing-int-series

Parse multiple decimal integers separated by arbitrary number of delimiters

Language: C++ - Size: 280 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 28 - Forks: 5

dot-asm/cryptogams

CRYPTOGAMS distribution repository

Language: Assembly - Size: 873 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 54 - Forks: 20

ckswjd99-at-snu/SHPC-2023-2

SNU CSE Scalable High Performance Computing (M1522.006700) - 2023 Autumn

Language: C - Size: 41.3 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

alainesp/simd-function

Python library to metaprogram C/C++ functions using SIMD instruction sets

Size: 145 KB - Last synced: about 1 month ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

morian/leek

SSE/AVX2/AVX512 onion v2 address generator.

Language: C - Size: 215 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 4 - Forks: 0

stuarthayhurst/battleships

Battleships opponent and compute experiments, with AVX2 / AVX-512

Language: Python - Size: 86.9 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

yzhaiustc/Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F

Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.

Language: C - Size: 3.33 MB - Last synced: 5 months ago - Pushed: over 2 years ago - Stars: 65 - Forks: 16

terrorgarten/AVS_P1

Mandelbrot set calculation using OpenMP vectorization. School project. Tested on barbora.it4i.cz, batch calculator needs a fix.

Language: C++ - Size: 1.68 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 0 - Forks: 0

WojciechMula/sse4-strstr

SIMD (SWAR/SSE/SSE4/AVX2/AVX512F/ARM Neon) of Karp-Rabin algorithm's modification

Language: C++ - Size: 112 KB - Last synced: 6 months ago - Pushed: over 2 years ago - Stars: 216 - Forks: 27

Steppenwolfe65/CEX

The CEX Cryptographic library in C++

Language: HTML - Size: 3.42 GB - Last synced: 7 months ago - Pushed: about 1 year ago - Stars: 55 - Forks: 25

altimesh/hybridizer-basic-samples

Examples of C# code compiled to GPU by hybridizer

Language: C# - Size: 81 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 220 - Forks: 34

nicholasferguson/Portable_SIMD

testing an SIMD api from VecCore VecGeom, using backends of UMESIMD, VC for Avx Avx2,AVX512, SSE, SSE2

Language: C++ - Size: 1.22 MB - Last synced: 7 months ago - Pushed: over 6 years ago - Stars: 2 - Forks: 0

shibatch/cpuburnavx2

Programs for producing as much heat as possible with AVX2 or AVX512 instructions

Language: C - Size: 1000 Bytes - Last synced: 7 months ago - Pushed: over 3 years ago - Stars: 3 - Forks: 0

WojciechMula/ternary-logic

Support for ternary logic in SSE, XOP, AVX2 and x86 programs

Language: C++ - Size: 132 KB - Last synced: 7 months ago - Pushed: over 2 years ago - Stars: 24 - Forks: 9

VcDevel/std-simd

std::experimental::simd for GCC [ISO/IEC TS 19570:2018]

Language: C++ - Size: 3.34 MB - Last synced: 7 months ago - Pushed: about 1 year ago - Stars: 510 - Forks: 38

YuriMyakotin/ChaCha20-SIMD

ChaCha20 C SIMD implementations - AVX512, AVX2, SSE2

Language: C - Size: 18.6 KB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0

edanor/umesimd

UME::SIMD A library for explicit simd vectorization.

Language: C++ - Size: 5.89 MB - Last synced: 7 months ago - Pushed: over 6 years ago - Stars: 84 - Forks: 18

agenium-scale/boost.simd

Boost SIMD

Size: 192 KB - Last synced: 7 months ago - Pushed: about 5 years ago - Stars: 233 - Forks: 50

rainerzufalldererste/hypersonic-rANS

Some of the fastest decoding range-based Asymetric Numeral Systems (rANS) codecs for x64

Language: C++ - Size: 2.93 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 7 - Forks: 1

zbjornson/node-bswap

Fast byte swapping for Node.js and browsers

Language: JavaScript - Size: 106 KB - Last synced: 24 days ago - Pushed: 8 months ago - Stars: 6 - Forks: 3

agenium-scale/nsimd

Agenium Scale vectorization library for CPUs and GPUs

Language: C - Size: 6.92 MB - Last synced: 7 months ago - Pushed: over 2 years ago - Stars: 303 - Forks: 31

dendisuhubdy/simple_vector_classes Fork of VcDevel/Vc

SIMD Vector Classes for C++

Language: C++ - Size: 9.15 MB - Last synced: 9 months ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0

vxst/qrand

High Quality Quick Random Number Generator that passes BigCrush suite

Language: C++ - Size: 8.79 KB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 11 - Forks: 1

marshallward/optiflop

Optiflop measures the optimally achievable FLOPs for mathematical operations on various platforms.

Language: C - Size: 599 KB - Last synced: 10 days ago - Pushed: about 1 month ago - Stars: 11 - Forks: 2

misharash/Corrfunc Fork of manodeep/Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.

Language: C - Size: 150 MB - Last synced: 9 months ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0

ntuhpc/sc17-mrbayes

MrBayes optimized with AVX512 and FMA

Language: C - Size: 2.16 MB - Last synced: 9 months ago - Pushed: over 6 years ago - Stars: 3 - Forks: 5

jviney/bilinear_filter_simd

Bilinear image filtering implemented with SSE4, AVX2 and AVX512.

Language: C++ - Size: 1.74 MB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 8 - Forks: 0

mklarqvist/positional-popcount

Fast C functions for the computing the positional popcount (pospopcnt).

Language: C - Size: 545 KB - Last synced: 11 months ago - Pushed: over 4 years ago - Stars: 50 - Forks: 5

xusworld/tars

Tars is a cool deep learning framework.

Language: C++ - Size: 151 KB - Last synced: 12 months ago - Pushed: 12 months ago - Stars: 2 - Forks: 0

WOnder93/argon2

A multi-arch library implementing the Argon2 password hashing algorithm.

Language: C - Size: 152 KB - Last synced: 12 months ago - Pushed: almost 3 years ago - Stars: 13 - Forks: 8

tugrul512bit/VectorizedKernel

Running GPGPU-like kernels on CPU with auto-vectorization for SSE/AVX/AVX512 SIMD Architectures

Language: C++ - Size: 241 KB - Last synced: almost 1 year ago - Pushed: almost 1 year ago - Stars: 5 - Forks: 0

HJLebbink/x86doc Fork of fay59/x86doc

HTML representation of the Intel x86 instructions documentation (June 2016).

Language: HTML - Size: 3.42 MB - Last synced: 12 months ago - Pushed: almost 7 years ago - Stars: 61 - Forks: 14

mklarqvist/libalgebra

Fast C header-only library for popcnt, pospopcnt, and set algebraic operations

Language: C - Size: 92.8 KB - Last synced: 11 months ago - Pushed: over 4 years ago - Stars: 38 - Forks: 7

mklarqvist/StormBitmaps

Fast algorithms for computing XX^T for binary matrices

Language: C - Size: 657 KB - Last synced: 11 months ago - Pushed: over 4 years ago - Stars: 13 - Forks: 2

WojciechMula/simd-byte-lookup

SIMDized check which bytes are in a set

Language: Python - Size: 23.4 KB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 23 - Forks: 2

WojciechMula/ternarylogiccli

CLI utilty to work out proper constants for vpternlogic instruction

Language: Python - Size: 4.88 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 8 - Forks: 0

toksaitov/ips-arch-project

ips-arch-project is a project for the Computer Architecture course at AUCA.

Language: C - Size: 24.4 KB - Last synced: 28 days ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 3

jeffhammond/vpu-count

Information about AVX-512 support on recent Intel processors

Language: C - Size: 63.5 KB - Last synced: about 1 year ago - Pushed: about 2 years ago - Stars: 32 - Forks: 2

lemonjesus/avx512-polyline

An implementation of Google's Encoded Polyline algorithm in AVX512 because why not. Perhaps the fastest and least portable polyline encoder out there?

Language: C - Size: 33.2 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

animetosho/md5-optimisation

The fastest MD5 implementation using x86 assembly

Language: C++ - Size: 375 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 70 - Forks: 7

matthewkolbe/ThinkingInSimd

An essay comparing performance implications of ignoring AVX acceleration

Language: C++ - Size: 394 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 2 - Forks: 0

zbjornson/fastcode

A list of fast libraries, primarily x86/64 C++ and Node.js C++ extensions

Size: 2.93 KB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 16 - Forks: 2

sandialabs/p3a

Portably Performant Physical Algebra

Language: C++ - Size: 645 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 9 - Forks: 5

MahdiSafsafi/UnivDisasm

x86 Disassembler and Analyzer

Language: Pascal - Size: 6.37 MB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 83 - Forks: 31

itzmeanjan/blake3

SYCL accelerated BLAKE3 Hash Implementation

Language: C++ - Size: 104 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 10 - Forks: 2

PatwinchIR/ultra-sort

DSL for SIMD Sorting on AVX2 & AVX512

Language: C++ - Size: 6.43 MB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 30 - Forks: 2