Topic: "fault-tolerance"
InterviewReady/system-design-resources
These are the best resources for System Design on the Internet
Size: 112 KB - Last synced at: 5 months ago - Pushed at: 10 months ago - Stars: 16,165 - Forks: 2,021

distribworks/dkron
Dkron - Distributed, fault tolerant job scheduling system https://dkron.io
Language: Go - Size: 132 MB - Last synced at: about 9 hours ago - Pushed at: about 10 hours ago - Stars: 4,477 - Forks: 397

mesos/chronos
Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
Language: Scala - Size: 7.13 MB - Last synced at: 16 days ago - Pushed at: almost 3 years ago - Stars: 4,384 - Forks: 525

shunfei/cronsun
A Distributed, Fault-Tolerant Cron-Style Job System.
Language: Go - Size: 74.9 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 2,923 - Forks: 459

bastion-rs/bastion
Highly-available Distributed Fault-tolerant Runtime
Language: Rust - Size: 3.91 MB - Last synced at: 1 day ago - Pushed at: about 2 years ago - Stars: 2,846 - Forks: 102

heidihoward/distributed-consensus-reading-list
A list of papers about distributed consensus.
Size: 274 KB - Last synced at: 27 days ago - Pushed at: 10 months ago - Stars: 2,568 - Forks: 214

polarismesh/polaris
Service Discovery and Governance Platform for Microservice and Distributed Architecture
Language: Go - Size: 48.5 MB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 2,458 - Forks: 402

anthdm/hollywood
Blazingly fast and light-weight Actor engine written in Golang
Language: Go - Size: 410 KB - Last synced at: about 9 hours ago - Pushed at: about 10 hours ago - Stars: 1,937 - Forks: 138

sger/ElixirBooks
List of Elixir books
Size: 240 KB - Last synced at: 15 days ago - Pushed at: almost 3 years ago - Stars: 1,441 - Forks: 111

kraken-php/framework
Asynchronous & Fault-tolerant PHP Framework for Distributed Applications.
Language: PHP - Size: 1.55 MB - Last synced at: 5 days ago - Pushed at: almost 8 years ago - Stars: 1,108 - Forks: 59

lizardfs/lizardfs
LizardFS is an Open Source Distributed File System licensed under GPLv3.
Language: C++ - Size: 12.8 MB - Last synced at: 15 days ago - Pushed at: 10 months ago - Stars: 973 - Forks: 188

golemcloud/golem
Golem is an open source durable computing platform that makes it easy to build and deploy highly reliable distributed systems.
Language: Rust - Size: 264 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 786 - Forks: 124

bakwc/PySyncObj
A library for replicating your python class between multiple servers, based on raft protocol
Language: Python - Size: 630 KB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 722 - Forks: 115

Tencent/TSeer
A high available service discovery & registration & fault-tolerance framework
Language: C++ - Size: 1.3 MB - Last synced at: 14 days ago - Pushed at: about 2 years ago - Stars: 676 - Forks: 146

ackintosh/ganesha
:elephant: A Circuit Breaker pattern implementation for PHP applications.
Language: PHP - Size: 827 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 631 - Forks: 45

riot-ml/riot
An actor-model multi-core scheduler for OCaml 5 🐫
Language: OCaml - Size: 591 KB - Last synced at: 7 days ago - Pushed at: 7 months ago - Stars: 620 - Forks: 39

Polly-Contrib/Simmy
Simmy is a chaos-engineering and fault-injection tool, integrating with the Polly resilience project for .NET
Language: C# - Size: 396 KB - Last synced at: 14 days ago - Pushed at: about 3 years ago - Stars: 568 - Forks: 25

PlatformLab/RAMCloud
**No Longer Maintained** Official RAMCloud repo
Language: C++ - Size: 13.3 MB - Last synced at: 12 days ago - Pushed at: over 5 years ago - Stars: 494 - Forks: 145

valkey-io/valkey-glide
An open source Valkey client library that supports Valkey, and Redis open source 6.2, 7.0 and 7.2. Valkey GLIDE is designed for reliability, optimized performance, and high-availability, for Valkey and Redis OSS based applications. GLIDE is a multi language client library, written in Rust with programming language bindings, such as Java and Python
Language: Java - Size: 299 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 474 - Forks: 92

ChrisWhealy/DistributedSystemNotes
Notes on Lindsey Kuper's lectures on Distributed Systems
Size: 35.2 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 460 - Forks: 88

CloudI/CloudI
A Cloud at the lowest level!
Language: Erlang - Size: 59.8 MB - Last synced at: 15 days ago - Pushed at: 20 days ago - Stars: 414 - Forks: 50

infinit/infinit
The Infinit policy-based software-defined storage platform.
Size: 1000 Bytes - Last synced at: 2 months ago - Pushed at: over 8 years ago - Stars: 366 - Forks: 13

thespianpy/Thespian
Python Actor concurrency library
Language: Python - Size: 19.2 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 321 - Forks: 67

pytorch/torchft
PyTorch per step fault tolerance (actively under development)
Language: Python - Size: 2.52 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 302 - Forks: 33

awolden/brakes
Hystrix compliant Node.js Circuit Breaker Library
Language: JavaScript - Size: 851 KB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 301 - Forks: 35

artilleryio/chaos-lambda
Serverless chaos monkey for AWS (runs on AWS Lambda) ☁️ 💥
Language: JavaScript - Size: 46.9 KB - Last synced at: about 22 hours ago - Pushed at: over 1 year ago - Stars: 289 - Forks: 26

haraldng/omnipaxos
OmniPaxos is a distributed log implemented as a Rust library.
Language: Rust - Size: 6.08 MB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 199 - Forks: 31

kquick/Thespian Fork of thespianpy/Thespian
Python Actor concurrency library
Language: Python - Size: 26.4 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 190 - Forks: 25

drsound/fault_tolerant_router
A daemon, running in background on a Linux router or firewall, monitoring the state of multiple internet uplinks/providers and changing the routing accordingly. LAN/DMZ internet traffic is load balanced between the uplinks.
Language: Ruby - Size: 90.8 KB - Last synced at: 18 days ago - Pushed at: about 4 years ago - Stars: 184 - Forks: 20

hegongshan/File-System-Paper
Must-read Papers for File System (FS)
Size: 18.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 176 - Forks: 18

hhblaze/Raft.Net
Implementation of RAFT distributed consensus algorithm among TCP Peers on .NET / .NETStandard / .NETCore / dotnet
Language: C# - Size: 10.5 MB - Last synced at: 1 day ago - Pushed at: almost 2 years ago - Stars: 174 - Forks: 27

sniffy/sniffy
Sniffy - interactive profiler, testing and chaos engineering tool for Java
Language: Java - Size: 10.4 MB - Last synced at: 6 months ago - Pushed at: about 2 years ago - Stars: 161 - Forks: 20

justin-db/JustinDB
⚛️ JustinDB is a highly available globally distributed key-value data store.
Language: Scala - Size: 5.13 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 159 - Forks: 19

svroonland/rezilience
ZIO-native utilities for making resilient distributed systems
Language: Scala - Size: 2.61 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 157 - Forks: 16

polarismesh/polaris-java
Lightweight Java SDK used as Proxyless Service Governance
Language: Java - Size: 4.08 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 149 - Forks: 85

Polly-Contrib/Polly.Contrib.WaitAndRetry
Polly.Contrib.WaitAndRetry is an extension library for Polly containing helper methods for a variety of wait-and-retry strategies.
Language: C# - Size: 186 KB - Last synced at: 30 days ago - Pushed at: over 2 years ago - Stars: 138 - Forks: 12

polarismesh/polaris-go
Lightweight Go SDK used as Proxyless Service Governance
Language: Go - Size: 96.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 137 - Forks: 63

ovh/metronome 📦
Metronome is a distributed and fault-tolerant event scheduler
Language: Go - Size: 180 KB - Last synced at: 12 months ago - Pushed at: over 6 years ago - Stars: 134 - Forks: 11

josephwilk/circuit-breaker
Circuit breaker for Clojure
Language: Clojure - Size: 30.3 KB - Last synced at: about 20 hours ago - Pushed at: over 7 years ago - Stars: 130 - Forks: 7

irrustible/async-backplane
Simple, Erlang-inspired fault-tolerance framework for Rust Futures.
Language: Rust - Size: 175 KB - Last synced at: 2 days ago - Pushed at: almost 4 years ago - Stars: 129 - Forks: 5

alejandro-du/vaadin-microservices-demo
A microservices example developed with Spring Cloud and Vaadin
Language: Java - Size: 943 KB - Last synced at: 28 days ago - Pushed at: over 3 years ago - Stars: 127 - Forks: 63

eBay/Gringofts
Gringofts makes it easy to build a replicated, fault-tolerant, high throughput and distributed event-sourced system.
Language: C++ - Size: 8.53 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 104 - Forks: 31

Cuju-ft/Cuju
Cuju: An Open Source Project for Virtualization-Based Fault Tolerance *Provide active-passive fault tolerance service*
Language: C - Size: 14.7 MB - Last synced at: 9 months ago - Pushed at: almost 3 years ago - Stars: 88 - Forks: 33

leil-io/saunafs
SaunaFS is a free-and open source, distributed POSIX file system inspired by Google File System.
Language: C++ - Size: 14.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 78 - Forks: 6

bastion-rs/artillery
Fire-forged cluster management & Distributed data protocol
Language: Rust - Size: 853 KB - Last synced at: 1 day ago - Pushed at: over 3 years ago - Stars: 75 - Forks: 10

k8snetworkplumbingwg/bond-cni
Bond-cni is for fail-over and high availability of networking in cloudnative orchestration
Language: Go - Size: 6.33 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 70 - Forks: 29

hungys/swimring
SwimRing - A Minimal Distributed Fault-Tolerant Key-Value Store built with SWIM Gossip Protocol and Consistent Hash Ring
Language: Go - Size: 53.7 KB - Last synced at: over 1 year ago - Pushed at: almost 9 years ago - Stars: 59 - Forks: 10

cornell-netlab/yates
YATES (Yet Another Traffic Engineering System)
Language: OCaml - Size: 104 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 57 - Forks: 19

Vinelab/http
A smart, simple and fault-tolerant HTTP client for sending and receiving JSON and XML
Language: PHP - Size: 57.6 KB - Last synced at: 11 days ago - Pushed at: over 5 years ago - Stars: 57 - Forks: 28

dhanushkamath/Burgernaut
A distributed message-based food ordering system developed with RabbitMQ, Node.js, Express and MongoDB
Language: JavaScript - Size: 379 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 54 - Forks: 26

theodesp/stable-systems-checklist
An opinionated list of attributes and policies that need to be met in order to establish a stable software system.
Size: 9.77 KB - Last synced at: 8 months ago - Pushed at: almost 8 years ago - Stars: 52 - Forks: 9

bastion-rs/fort
Proc macro attributes for Bastion runtime.
Language: Rust - Size: 37.1 KB - Last synced at: 4 days ago - Pushed at: over 3 years ago - Stars: 50 - Forks: 4

heidihoward/ios 📦
Reliable distributed agreement service for the cloud
Language: Go - Size: 44.8 MB - Last synced at: 12 months ago - Pushed at: about 8 years ago - Stars: 47 - Forks: 7

leondavi/NErlNet
Nerlnet is a framework for research and development of distributed machine learning models on IoT
Language: Python - Size: 66 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 46 - Forks: 8

dot-microservices/dot-rest
a minimalist toolkit for building scalable, fault tolerant and eventually-consistent microservices
Language: JavaScript - Size: 1000 KB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 45 - Forks: 2

imperial-qore/PreGAN
[Infocom'22] Preemptive Migration Prediction Network for Proactive Fault Tolerant Edge Computing
Language: Python - Size: 136 MB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 42 - Forks: 6

reugn/kotlin-backoff
An exponential backoff library for Kotlin
Language: Kotlin - Size: 146 KB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 40 - Forks: 0

hyperion-cs/dhaf
Distributed high availability failover, written in cross-platform C# .NET (Linux, Windows and macOS supported).
Language: C# - Size: 573 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 37 - Forks: 1

haochenpan/rabia
Rabia: Simplifying State-Machine Replication Through Randomization (SOSP 2021)
Language: Go - Size: 58.8 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 37 - Forks: 12

daos-stack/cart 📦
DAOS Transport Layer
Language: C - Size: 3.57 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 33 - Forks: 14

IBM/kar
KAR: A Runtime for the Hybrid Cloud
Language: Go - Size: 10.8 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 30 - Forks: 12

replica-io/replica-io
Compose practical distributed replication mechanisms
Size: 24.4 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 28 - Forks: 1

polarismesh/polaris-cpp
Lightweight C++ SDK used as Proxyless Service Governance
Language: C++ - Size: 9.6 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 28 - Forks: 13

vany0114/chaos-injection-using-simmy
A microservice based application to demonstrate how chaos engineering works with Simmy using chaos policies in a distributed system.
Language: C# - Size: 866 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 28 - Forks: 2

NeoResearch/libbft
A lightweight and multi-language library for byzantine fault tolerance
Language: C++ - Size: 1.54 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 25 - Forks: 3

ldmtam/raft-auto-increment
Distributed, fault-tolerant, persistent, auto-increment ID generation service with Raft consensus
Language: Go - Size: 85 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 25 - Forks: 1

myntra/cortex
A fault-tolerant events/alerts correlation engine
Language: Go - Size: 4.74 MB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 25 - Forks: 14

ertgl/distributed 📦
Distributed is a wrapper module that helps developers to make distributed, scaled, replicated and fault-tolerant (with takeover ability) leader-follower systems.
Language: Elixir - Size: 10.7 KB - Last synced at: 5 days ago - Pushed at: over 7 years ago - Stars: 25 - Forks: 3

AndyObtiva/abstract_feature_branch
abstract_feature_branch is a Ruby gem that provides a variation on the Branch by Abstraction Pattern by Paul Hammant and the Feature Toggles Pattern by Martin Fowler (aka Feature Flags) to enable Continuous Integration and Trunk-Based Development.
Language: Ruby - Size: 225 KB - Last synced at: 3 days ago - Pushed at: 6 months ago - Stars: 23 - Forks: 5

medavox/MuTime
NTP time syncing library for Android
Language: Kotlin - Size: 448 KB - Last synced at: about 2 months ago - Pushed at: about 5 years ago - Stars: 22 - Forks: 5

gdanezis/pybft
Experiments with pBFT
Language: Python - Size: 27.3 KB - Last synced at: 2 days ago - Pushed at: over 7 years ago - Stars: 22 - Forks: 18

savariamir/Finity
Finity is a .NET Core resilience and Fault tolerance library that allows developers to extend IHttpClientFactory such as Retry, Circuit Breaker, Caching, Authentication and, Bulkhead Isolation.
Language: C# - Size: 213 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 20 - Forks: 1

kdally/fault-tolerant-flight-control-drl
Deep Reinforcement Learning for Flight Control
Language: Python - Size: 142 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 20 - Forks: 6

codefarm0/resilience4j
Resilience4j - Circuit breaker, bulkhead, rate limiter, retry, application monitoring with prometheus, grafana
Language: Java - Size: 85.9 KB - Last synced at: 5 months ago - Pushed at: over 5 years ago - Stars: 20 - Forks: 32

r3w0p/bobocep
A fault-tolerant Complex Event Processing engine designed for edge computing in Internet of Things systems.
Language: Python - Size: 1.99 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 19 - Forks: 4

konnov/bymc
Byzantine model checker
Language: C - Size: 8.13 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 19 - Forks: 5

clickyotomy/go-shuffle-shard
Implementation of Amazon's Shuffle Sharding in go.
Language: Go - Size: 42 KB - Last synced at: 12 months ago - Pushed at: about 4 years ago - Stars: 19 - Forks: 3

gagan-iitb/ComputerSysDesign
Designing IT and ML Applications using Systems Thinking Approach at IIT Bhilai (CS559)
Size: 4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 18 - Forks: 3

ks-amit/Distributed-Database
A travel agency app with a distributed database implemented from scratch!
Language: Python - Size: 776 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 17 - Forks: 6

YogenRaii/kraker-info
Microservices based project to extract the information from the user data from different sources.
Language: Java - Size: 436 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 17 - Forks: 10

rbrahul/retry
An essential retry-operation related library for Golang to build fault-tolerant system.
Language: Go - Size: 24.4 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 0

andrewlee302/MIT-6.824
Implementation of MIT 6.824: Distributed Systems
Language: Go - Size: 1.47 MB - Last synced at: about 1 month ago - Pushed at: over 7 years ago - Stars: 16 - Forks: 3

Clivern/Cluster
Golang Package for System Clustering.
Language: Go - Size: 238 KB - Last synced at: 10 days ago - Pushed at: 4 months ago - Stars: 15 - Forks: 1

portals-project/portals
Portals is a framework for flexible stateful serverless apps, unifying dataflow streaming with actors
Language: Scala - Size: 915 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 15 - Forks: 1

yaroslaff/okerr-dev
Okerr hybrid (host/network) monitoring system with remote network checks, email/Telegram alerts and DynDNS fault-tolerance feature.
Language: Python - Size: 1.26 MB - Last synced at: 18 days ago - Pushed at: 3 months ago - Stars: 14 - Forks: 0

jabolina/go-mcast
Golang based implementation of the Generic Multicast protocol.
Language: Go - Size: 1.73 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 14 - Forks: 4

selivan/inet-failover
Failver for Linux router with 2 uplinks.
Language: Shell - Size: 4.88 KB - Last synced at: 3 days ago - Pushed at: about 8 years ago - Stars: 14 - Forks: 5

Lambels/cronjob
Cron but with golang time specification. ⏰
Language: Go - Size: 34.2 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 13 - Forks: 0

konnov/fault-tolerant-benchmarks
Fault-tolerant distributed algorithms encoded in a formal language
Language: F* - Size: 1.58 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 13 - Forks: 4

pviotti/hybris 📦
Robust and strongly consistent hybrid cloud storage library
Language: C - Size: 1020 KB - Last synced at: about 1 year ago - Pushed at: over 8 years ago - Stars: 13 - Forks: 4

Pscheidl/FortEE
Jakarta EE / Java EE fault-tolerance guard leveraging the Optional pattern. Its power lies in its simplicity.
Language: Java - Size: 163 KB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 3

imperial-qore/DeepFT
Self-Supervised Deep Learning based Surrogate Models for Fault-Tolerant Edge Computing
Language: Python - Size: 152 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 2

sergey-melnychuk/distributed-algorithms
Implementation of classic distributed algorithms: membership, failure detection, quorum, replication etc.
Language: Java - Size: 64.5 KB - Last synced at: 12 months ago - Pushed at: over 5 years ago - Stars: 12 - Forks: 0

sger/elixir_dropbox
Simple Dropbox v2 client for Elixir
Language: Elixir - Size: 6.13 MB - Last synced at: 2 days ago - Pushed at: almost 6 years ago - Stars: 12 - Forks: 16

GaloisInc/LIMA
LIMA: Language for Integrated Modeling and Analysis
Language: Haskell - Size: 195 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 12 - Forks: 2

RuiHuangNUS/MARS-Reconfig
[ICRA 2025]Robust Self-Reconfiguration for Fault-Tolerant Control of Modular Aerial Robot Systems
Language: Python - Size: 219 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 11 - Forks: 0

ami-iit/paper_nava_2023_icra_fault-control-ironcub
Repository associated with the paper "Failure Detection and Fault Tolerant Control of a Jet-Powered Flying Humanoid Robot", published in IEEE ICRA 2023.
Language: MATLAB - Size: 345 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 3

xarc/harv
HARV - HArdened Risc-V
Language: VHDL - Size: 109 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 11 - Forks: 1

erickTornero/Model-based-Quadrotor
Model based RL for fault-rotor quadrotor
Language: Python - Size: 13.8 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 11 - Forks: 1

xlab-uiuc/slooo
Slooo: A Fail-slow Fault Injection Testing Framework
Language: Xonsh - Size: 224 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 0
