An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: bigdata

4ngelojr/AI-ML-Cheatsheets

🗂️ Access essential AI and ML concepts with quick-reference cheatsheets for effective learning and project implementation.

Size: 51.6 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

volcano-sh/volcano

A Cloud Native Batch System (Project under CNCF)

Language: Go - Size: 88.7 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 5,107 - Forks: 1,216

taosdata/TDengine

High-performance, scalable time-series database designed for Industrial IoT (IIoT) scenarios

Language: C - Size: 695 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 24,522 - Forks: 4,971

Keeleyundynamic132/ml-visualization

🎓 Create engaging ML concept animations from text with an automated pipeline, enhancing learning through clear visuals and AI-driven quality control.

Language: Python - Size: 236 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

ramarimoo/insert-tools

Simple and fast Python toolset for bulk data insertion into databases and CSVs. Ideal for ETL pipelines and data engineering tasks.

Language: Python - Size: 31.3 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

vigneshSs-07/Cloud-AI-Analytics

This Repo contain details related to Data Engineering tech stacks in GCP

Language: Jupyter Notebook - Size: 15.7 MB - Last synced at: about 6 hours ago - Pushed at: 2 days ago - Stars: 58 - Forks: 81

apache/airavata

A general purpose Distributed Systems Framework

Language: Java - Size: 169 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 132 - Forks: 139

mciwing/data-campus

Data Campus - Learn Data 🚀

Language: HTML - Size: 90.5 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

martymac/fpart

Sort files and pack them into partitions

Language: C - Size: 1.39 MB - Last synced at: about 23 hours ago - Pushed at: 3 days ago - Stars: 278 - Forks: 46

apconw/sanic-web

一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen3等大模型 基于 Dify 、LangChain/LangGraph、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目,采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答,具备处理 CSV 文件 📂 表格问答的能力。同时,能方便对接第三方开源 RAG 系统 检索系统 🌐等,以支持广泛的通用知识问答。

Language: JavaScript - Size: 149 MB - Last synced at: 2 days ago - Pushed at: 4 days ago - Stars: 1,629 - Forks: 294

hibuz/hadoop-docker

🐳 hadoop bigdata stack(ecosystems) docker image

Language: Dockerfile - Size: 256 KB - Last synced at: about 15 hours ago - Pushed at: 3 days ago - Stars: 2 - Forks: 0

nebulastream/nebulastream

Data Management for the Internet of Things

Language: C++ - Size: 542 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 66 - Forks: 18

Yimyaa/AI-ML-Cheatsheets

All standford Cheatsheets: Artificial Intelligence, Transformers, LLMs, Deep Learning, Machine Learning, Probabilities, Statistics, Algebra and Calculus.

Size: 50.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 7 - Forks: 0

100-rab/AMO

[RSS 2025] AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control

Language: Python - Size: 44.5 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 4 - Forks: 0

apache/airavata-sandbox

Sanbox repository for explaratory Apache Airavata features

Language: JavaScript - Size: 27.6 MB - Last synced at: about 22 hours ago - Pushed at: 3 days ago - Stars: 1 - Forks: 24

legend-exp/legend-pydataobj

LEGEND Python Data Objects

Language: Python - Size: 1.3 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 1 - Forks: 11

brandmaier/semtree

Recursive Partitioning for Structural Equation Models

Language: R - Size: 32.8 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 20 - Forks: 13

joxnlxe0409/Python-for-bigdata

This repository is for 2-2 Dongyang Mirae University 'Python for Big Data' lecture.

Language: Python - Size: 158 KB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

open-metadata/openmetadata-site

Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

Language: TypeScript - Size: 88.9 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 15 - Forks: 12

apache/celeborn

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.

Language: Java - Size: 32.2 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 1,019 - Forks: 405

databendlabs/databend

𝗔𝗜-𝗡𝗮𝘁𝗶𝘃𝗲 𝗗𝗮𝘁𝗮 𝗪𝗮𝗿𝗲𝗵𝗼𝘂𝘀𝗲. Blazing analytics, fast search, geo insights, vector AI. Built for multimodal analytics, Open-source Snowflake alternative. https://databend.com

Language: Rust - Size: 314 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 8,991 - Forks: 836

sderosiaux/every-single-day-i-tldr

A daily digest of the articles or videos I've found interesting, that I want to share with you.

Size: 9.43 MB - Last synced at: 2 days ago - Pushed at: 4 days ago - Stars: 325 - Forks: 21

NationalSecurityAgency/datawave

DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.

Language: Java - Size: 115 MB - Last synced at: about 21 hours ago - Pushed at: 3 days ago - Stars: 645 - Forks: 271

AparajithKrishna/Mental-health-support

A simple and beginner-friendly web app built using HTML, CSS, and JavaScript to promote mental well-being.

Language: HTML - Size: 3.91 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 1

k0c0r/improved-journey

Derek Simmons - Strategic Builder | Innovation Architect

Size: 1000 Bytes - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

transferia/transferia

Open Source Cloud Native Ingestion engine

Language: Go - Size: 23.7 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 166 - Forks: 20

reductstore/reductstore

High Performance Storage and Streaming Solution for Data Acquisition Systems

Language: Rust - Size: 3.3 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 257 - Forks: 20

yammdd/crypto-bigdata-project

A real-time and batch-based crypto analytics platform built with Lambda Architecture using Kafka, Spark, HDFS, HBase, MongoDB, OpenAI, and XGBoost - providing live short‑term predictions via Flask and long‑term insights with Streamlit dashboards, integrating Yahoo Finance and Binance data for 10 major cryptocurrencies.

Language: Python - Size: 100 MB - Last synced at: 1 day ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

oxnr/awesome-bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

Size: 845 KB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 14,049 - Forks: 2,582

xiaomeng79/learning_notes

学习笔记

Language: Jupyter Notebook - Size: 10.3 MB - Last synced at: 1 day ago - Pushed at: 4 days ago - Stars: 25 - Forks: 10

bigbio/quantms-utils

A python library with scripts and helpers classes for quantms workflow

Language: Python - Size: 96.9 MB - Last synced at: 2 days ago - Pushed at: 4 days ago - Stars: 5 - Forks: 4

juicedata/juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

Language: Go - Size: 63.1 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 12,459 - Forks: 1,110

elastic-io/mindb

An embedded object storage database for Go with S3-compatible API.

Language: Go - Size: 72.3 KB - Last synced at: 2 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

Fradhyle/Voo-ong

인공지능을 활용한 개인화 영화 추천 시스템

Language: Jupyter Notebook - Size: 57.8 MB - Last synced at: 2 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

arvados/arvados

An open source platform for managing and analyzing biomedical big data

Language: Go - Size: 83.5 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 412 - Forks: 125

vaexio/vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

Language: Python - Size: 133 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 8,453 - Forks: 603

gaetancorin/Datapipeline_compare_official_vs_gas_stations_oil_prices

This project processes 56 million records of oil prices changes from French gas stations (2008 → today), and compares them with official government oil prices to analyze pricing differences

Language: Python - Size: 54.9 MB - Last synced at: 2 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

aliyun/aliyun-maxcompute-data-collectors

Language: Java - Size: 93.8 MB - Last synced at: 2 days ago - Pushed at: 5 days ago - Stars: 133 - Forks: 65

ARUNAGIRINATHAN-K/Retail-Transaction-Analytics

Basket Analysis

Language: Jupyter Notebook - Size: 10.2 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 6 - Forks: 0

DTStack/dt-sql-parser

SQL Parsers for BigData, built with antlr4.

Language: TypeScript - Size: 52.7 MB - Last synced at: 2 days ago - Pushed at: 5 days ago - Stars: 362 - Forks: 106

microsoft/Mobius

C# and F# language binding and extensions to Apache Spark

Language: C# - Size: 6.44 MB - Last synced at: 4 days ago - Pushed at: almost 2 years ago - Stars: 943 - Forks: 209

scikit-hep/uproot5

ROOT I/O in pure Python and NumPy.

Language: Python - Size: 3.91 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 256 - Forks: 85

NewLifeX/NewLife.XCode

20 年演进的 .NET 高性能数据中间件,聚焦极致性能、海量数据、自动建模/迁移、多级缓存、自动分表分库,支持 MySQL/SQLite/SqlServer/Oracle/Postgresql/达梦 等

Language: C# - Size: 169 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 92 - Forks: 38

canimus/cuallee

Possibly the fastest DataFrame-agnostic quality check library in town.

Language: Python - Size: 2.38 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 227 - Forks: 21

ErginTIRAVOGLU/BigDataOrdersDashboard

Udemy - Büyük Veri Analitiği & Veri Görselleştirme ve Tahminleme by M&Y Yazılım Eğitim Akademi Danışmanlık

Language: JavaScript - Size: 149 MB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

groda/big_data

Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.

Language: Jupyter Notebook - Size: 61.5 MB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 84 - Forks: 27

kartzum/d-space

Algorithms, BigData, Apache Spark...

Language: Java - Size: 941 KB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

rustfs/rustfs

🚀2.3x faster than MinIO for 4KB object payloads. RustFS is an open-source, S3-compatible high-performance object storage system supporting migration and coexistence with other S3-compatible platforms such as MinIO and Ceph.

Language: Rust - Size: 10.6 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 11,529 - Forks: 548

wet-discriminativestimulus220/Capstone_Google_Data_Analytics_Certificate

🧑🤝🧑 Explore the links between social inclusion, physical health, and mental well-being for older Europeans using comprehensive health survey data.

Size: 1.56 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

apache/avro

Apache Avro is a data serialization system.

Language: Java - Size: 78.4 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 3,180 - Forks: 1,712

SakCo/F1-WorldChampDataAnalysis

Advanced data project analyzing 70+ years of Formula 1 data with 100% prediction accuracy. 25K+ race records • 6 ML algorithms • Interactive maps • 67% accuracy improvement through feature scaling

Language: Jupyter Notebook - Size: 1.58 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

notcoininu/grid

🚀 Streamline your crypto trading with this powerful grid trading system, featuring multiple strategies and support for various exchanges.

Language: Python - Size: 1.63 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

jamesmudd/jhdf

A pure Java HDF5 library

Language: Java - Size: 4.47 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 164 - Forks: 41

CurvineIO/curvine

High-performance distributed multi-level cache system. Built by Rust.

Language: Rust - Size: 2.14 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 430 - Forks: 58

apache/hudi

Upserts, Deletes And Incremental Processing on Big Data.

Language: Java - Size: 2.24 GB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 6,025 - Forks: 2,442

Correia-jpv/fucking-awesome-bigdata

A curated list of awesome big data frameworks, resources and other awesomeness. With repository stars⭐ and forks🍴

Size: 655 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 11 - Forks: 1

apache/shardingsphere

Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.

Language: Java - Size: 646 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 20,541 - Forks: 6,871

DonaldET/DemoDev

Examples: coding techniques, algorithm comparisons, interview coding tests, useful utilities

Language: Java - Size: 56.7 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0

fahadkalil/bigdata_docker

Multi Container Docker para Pipeline em Big Data

Language: Dockerfile - Size: 69 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0

griddb/griddb

GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.

Language: C++ - Size: 24.8 MB - Last synced at: 4 days ago - Pushed at: 6 months ago - Stars: 2,463 - Forks: 5,029

sparsh771/Power-BI-Data-Analysis

Power-BI-Data-Analysis offers an interactive Power BI dashboard with 2022–2023 retail sales data, showing trends in products, stores, and customer behavior 🐙.

Size: 735 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

dimajix/flowman

Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.

Language: Scala - Size: 18.8 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 97 - Forks: 19

jy1212686/eta-etl-spark

🚖 Ingest and analyze NYC yellow taxi data with a streamlined ETL pipeline, featuring data cleaning, analytics, and business-ready outputs.

Language: Jupyter Notebook - Size: 1.33 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

jfsanchez/bigdata

Language: Jupyter Notebook - Size: 30.3 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 3 - Forks: 1

FadilAdz/praktikumBigData

Repository ini berisi rangkaian praktikum Big Data yang mencakup penyimpanan terdistribusi (HDFS, MongoDB, Cassandra), pemrosesan data skala besar menggunakan MapReduce dan Apache Spark, pembangunan pipeline ingestion dengan Sqoop, Flume, dan Kafka.

Size: 238 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

electronic-pig/Yelp-Analysis-and-Reco_frontend

yelp(美版大众点评)点评数据分析与推荐项目前端仓库,是集成了大数据分析及可视化,以及大数据应用开发的WebApp应用.

Language: Vue - Size: 139 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 7 - Forks: 1

HariSekhon/Knowledge-Base

Large Tech Knowledge Base from 20 years in DevOps, Linux, Cloud, Big Data, AWS, GCP and international Consulting including extensive Travel Tips around the world

Language: Shell - Size: 184 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 251 - Forks: 49

dmwm/CMSSpark

General purpose framework to run CMS experiment workflows on HDFS/Spark platform

Language: Python - Size: 3.63 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 12 - Forks: 21

DTStack/chunjun

A data integration framework

Language: Java - Size: 127 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 4,094 - Forks: 1,694

fdv/running-elasticsearch-fun-profit

A book about running Elasticsearch

Size: 19.4 MB - Last synced at: 14 days ago - Pushed at: over 4 years ago - Stars: 811 - Forks: 223

lovnishverma/Slidespptspdfs

Pdfs For Learning Python, DBMS, Big Data and Data Science AIML and much more...

Size: 65.8 MB - Last synced at: 14 days ago - Pushed at: 15 days ago - Stars: 18 - Forks: 0

ictchenbo/SmartETL

SmartETL:一个简单、灵活、可配置、开箱即用的Python ETL框架,具有领域特色,拒绝重复造轮子!提供Wikidata / Wikipedia / GDELT等多种开源数据的处理流程; 支持txt/json/csv/excel等文件格式、MySQL/PostgreSQL/MongoDB/ClickHouse/ElasticSearch等数据库作为输入和输出; 提供大模型、Web API等多种处理算子

Language: Python - Size: 5.22 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 26 - Forks: 5

hi-primus/optimus

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Language: Python - Size: 110 MB - Last synced at: 10 days ago - Pushed at: 12 months ago - Stars: 1,527 - Forks: 233

zhaoyachao/zdh_web

大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批流,私域营销等模块

Language: Java - Size: 142 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 530 - Forks: 184

dustin-ww/City-Mood

This big data project aims to analyze and visualize the mood of a city based on various data sources including traffic patterns, social media activity, and environmental factors.

Size: 4.88 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

apache/avro-rs

Rust SDK for Apache Avro - a data serialization system.

Language: Rust - Size: 2.18 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 88 - Forks: 42

DTStack/monaco-sql-languages

SQL languages for monaco-editor

Language: TypeScript - Size: 66.6 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 276 - Forks: 49

AbsaOSS/spline

Data Lineage Tracking And Visualization Solution

Language: Scala - Size: 8.6 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 647 - Forks: 158

josemarialuna/ing-datos-big-data-US

Entorno de Big Data basado en Docker con Hadoop, Hive, Trino, Kafka y Airflow. Incluye configuraciones, scripts y ejemplos de MapReduce para análisis de datos distribuidos.

Language: Python - Size: 121 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

emeraldpay/dshackle-archive

ETL for Bitcoin and Ethereum data

Language: Rust - Size: 67.3 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 6 - Forks: 0

dsevilla/bdge

Recursos para la asignatura BDGE del Máster de Big Data de la UM/USC.

Language: Jupyter Notebook - Size: 194 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 11 - Forks: 6

BaseMax/LaravelBigDataTest

PHP Laravel: Develop a test environment in Laravel with more than 20 Million user rows. (A project in blade laravel and another SPA in vue js infinite scroll)

Language: PHP - Size: 1.83 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 7 - Forks: 1

dotnet/spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Language: C# - Size: 4.88 MB - Last synced at: 20 days ago - Pushed at: 2 months ago - Stars: 2,084 - Forks: 330

Bigdata-com/bigdata-cookbook

End-to-end financial text-analysis using Bigdata API and the Bigdata-Research-Tools library. Ready-to-use notebooks with RAG & GenAI enabling thematic and risk screening, trend tracking, and automated report generation, extracting insights at scale.

Language: HTML - Size: 45.5 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 17 - Forks: 5

apache/incubator-livy

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

Language: Scala - Size: 3.56 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 931 - Forks: 614

sakkiii/apache-nifi-helm

Helm chart deploys Latest Apache NiFi in a Kubernetes cluster

Language: Smarty - Size: 1.21 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 9 - Forks: 5

turboFei/turbofei.github.com

turboFei's Blog

Language: JavaScript - Size: 49.8 MB - Last synced at: 22 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

DataExpert-io/data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

Language: Jupyter Notebook - Size: 59.5 MB - Last synced at: 23 days ago - Pushed at: about 1 month ago - Stars: 38,528 - Forks: 7,412

eypigu/go-mallorca-graphql-api 📦

🌴Go Mallorca GraphQL API (with static data)

Language: TypeScript - Size: 98.6 KB - Last synced at: 16 days ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 0

atengk/ops

运维相关的技术仓库

Language: Shell - Size: 21.9 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 9 - Forks: 7

brunocampos01/data-engineering

Language: Python - Size: 166 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 11 - Forks: 2

bigbio/hvantk

Hail variant annotation toolkit

Language: Python - Size: 35.4 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

v7v-575/xcware

Free Platform for Cloud Computing, Sky Computing, Big Data, Serverless, Virtualization, Container, Cluster, Private, Hybrid, Multi Cloud, and VMware/Cirtix alternative.

Size: 25.4 KB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 1 - Forks: 0

NewLifeX/AntJob

高吞吐 .NET 分布式任务与实时数据调度平台:时间/数据/消息/Cron/SQL/脚本切片,自动重试与弹性扩缩,回溯补算 + Web 控制台。High‑throughput .NET distributed job & real‑time scheduler with fine‑grained slicing, retries, elastic scaling & web console.

Language: C# - Size: 10.3 MB - Last synced at: 23 days ago - Pushed at: 3 months ago - Stars: 427 - Forks: 99

Netflix/genie

Distributed Big Data Orchestration Service

Language: Java - Size: 217 MB - Last synced at: 16 days ago - Pushed at: about 1 month ago - Stars: 1,758 - Forks: 373

raystack/meteor

Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog.

Language: Go - Size: 11.7 MB - Last synced at: 11 days ago - Pushed at: 2 months ago - Stars: 218 - Forks: 46

doitintl/banias

Opinionated serverless event analytics pipeline

Language: Go - Size: 416 KB - Last synced at: about 11 hours ago - Pushed at: over 2 years ago - Stars: 42 - Forks: 15

scray/scray

Lambda Architecture Framework for Big Data, Spark, Versioned Data, NoSQL and SQL-Stores.

Language: Scala - Size: 48 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 12 - Forks: 5

timebusker/timebusker.github.io

timebusker.github.io

Language: HTML - Size: 242 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 1 - Forks: 0

EDRN/P5

EDRN Production Program for the Public/Private Portal (P5)

Language: Python - Size: 15.5 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0