GitHub topics: bigdata
reductstore/reductstore
High Performance Storage and Streaming Solution for Data Acquisition Systems
Language: Rust - Size: 2.62 MB - Last synced at: about 9 hours ago - Pushed at: about 11 hours ago - Stars: 221 - Forks: 15

jotstolu/Azure-Data-Engineering-End-to-End-Project---NYC-taxi-dataset
An end‑to‑end data engineering pipeline for NYC Green Taxi trip records, built on Microsoft Azure. This project ingests Jan–Dec 2024 Parquet files from the NYC Taxi API into a Bronze Delta Lake layer, cleans and enriches the data in a Silver layer with PySpark on Azure Databricks, then saves the transformed data to the Gold layer in delta format
Size: 1.69 MB - Last synced at: about 10 hours ago - Pushed at: about 11 hours ago - Stars: 0 - Forks: 0

CurvineIO/curvine
High performance distributed cache system. Built by Rust.
Language: Rust - Size: 729 KB - Last synced at: about 16 hours ago - Pushed at: about 17 hours ago - Stars: 42 - Forks: 11

fahadkalil/bigdata_docker
Multi Container Docker para Pipeline em Big Data
Language: Dockerfile - Size: 13.2 MB - Last synced at: about 18 hours ago - Pushed at: about 20 hours ago - Stars: 1 - Forks: 0

apconw/sanic-web
一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen2.5等大模型 基于 Dify 、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目,采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答,具备处理 CSV 文件 📂 表格问答的能力。同时,能方便对接第三方开源 RAG 系统 检索系统 🌐等,以支持广泛的通用知识问答。
Language: TypeScript - Size: 141 MB - Last synced at: about 18 hours ago - Pushed at: about 19 hours ago - Stars: 837 - Forks: 160

rustfs/rustfs
🚀 High-performance distributed object storage for MinIO alternative.
Language: Rust - Size: 6.38 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,658 - Forks: 71

kartzum/d-space
Algorithms, BigData, Apache Spark...
Language: Java - Size: 922 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

apache/celeborn
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
Language: Java - Size: 31.4 MB - Last synced at: about 23 hours ago - Pushed at: 1 day ago - Stars: 957 - Forks: 411

sukbeta/sukbeta.github.io
舒克贝塔
Language: HTML - Size: 5.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 1

dromara/dataCompare
big data comparison and data profiling platform: low code,data comparison and data profiling
Language: Java - Size: 10.9 MB - Last synced at: about 13 hours ago - Pushed at: about 1 year ago - Stars: 269 - Forks: 62

open-metadata/openmetadata-site
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
Language: TypeScript - Size: 55.6 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 14 - Forks: 11

Fradhyle/Voo-ong
인공지능을 활용한 개인화 영화 추천 시스템
Language: Jupyter Notebook - Size: 57.7 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

juicedata/juicefs
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
Language: Go - Size: 62.5 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 11,837 - Forks: 1,052

zyra121/advertising-sales-prediction
This repository showcases a linear regression analysis using the Advertising dataset, demonstrating both simple and multiple regression techniques in Python. It also features a custom implementation of Gradient Descent for a deeper understanding of the concepts. 🐱💻📊
Language: Python - Size: 1.33 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

sderosiaux/every-single-day-i-tldr
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Size: 12.4 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 320 - Forks: 21

mjakubowski84/parquet4s
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Language: Scala - Size: 2.34 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 291 - Forks: 67

andrewsuadnya/YouTube-Live-Chat-Sentiment-Analysis
A near real-time sentiment analysis system for YouTube live chat using a big data stack. Built with Kafka, Spark Structured Streaming, Elasticsearch, Kibana, Flask, and React.js to collect, process, and visualize live messages.
Language: Python - Size: 23.1 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2 - Forks: 0

jnidzwetzki/bboxdb
BBoxDB is a scalable, highly available, and distributed data store for multi-dimensional big data. The software supports operations like multi-dimensional range queries and spatial joins. In addition, data streams are supported.
Language: Java - Size: 32.8 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 56 - Forks: 10

soro8/Recommender-System-MovieLens
# Recommender-System-MovieLensThis repository contains a machine learning project that builds a movie recommendation system using Content-Based Filtering and Collaborative Filtering approaches. 🛠️ It aims to help users navigate large film catalogs and discover new titles they are likely to enjoy. 📽️
Language: Jupyter Notebook - Size: 1.39 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

legend-exp/legend-pydataobj
LEGEND Python Data Objects
Language: Python - Size: 1.35 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 11

zhaoyachao/zdh_magic_mirror
zdh系列-基于java的经营风控引擎
Language: Java - Size: 1.59 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 12 - Forks: 5

zhaoyachao/zdh_web
大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批流,私域营销等模块
Language: Java - Size: 141 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 519 - Forks: 182

binghe001/binghe001.github.io
📚 本静态博客是作者冰河多年从事多年互联网大厂开发、架构的学习历程技术汇总,旨在为大家提供一个清晰详细的学习教程,侧重点更倾向编写Java核心内容、底层原理、架构知识、渗透技术。如果本仓库能为您提供帮助,请给予支持(关注、点赞、分享)!
Language: HTML - Size: 1.85 GB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 34 - Forks: 3

binghe001/BingheGuide
🔥🔥🔥 📚 本代码库是作者冰河多年从事互联网大厂开发、架构的学习历程技术汇总,旨在为大家提供一个清晰详细的学习教程,侧重点更倾向编写Java核心内容、底层原理、架构知识、渗透技术。如果本仓库能为您提供帮助,请给予支持(关注、点赞、分享)!
Language: Shell - Size: 692 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 356 - Forks: 147

HariSekhon/Knowledge-Base
Large Tech Knowledge Base from 20 years in DevOps, Linux, Cloud, Big Data, AWS, GCP etc - gradually porting my large private knowledge base to public
Language: Shell - Size: 183 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 188 - Forks: 36

PriyankaBhatta/60DaysofLearning2025
Welcome to my #60DaysOfLearning2025 challenge! This repository is a personal learning log where I document my daily progress as I explore and upskill in various areas of technology.
Size: 90.8 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

grouzen/zio-apache-parquet
Scala ZIO-powered Apache Parquet library
Language: Scala - Size: 387 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 25 - Forks: 4

ramarimoo/insert-tools
Simple and fast Python toolset for bulk data insertion into databases and CSVs. Ideal for ETL pipelines and data engineering tasks.
Language: Python - Size: 31.3 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

apache/airavata
A general purpose Distributed Systems Framework
Language: Java - Size: 168 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 122 - Forks: 135

Yimyaa/AI-ML-Cheatsheets
All standford Cheatsheets: Artificial Intelligence, Transformers, LLMs, Deep Learning, Machine Learning, Probabilities, Statistics, Algebra and Calculus.
Size: 50.3 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 0

arvados/arvados
An open source platform for managing and analyzing biomedical big data
Language: Go - Size: 75.2 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 404 - Forks: 123

100-rab/AMO
[RSS 2025] AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control
Language: Python - Size: 44.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

vaexio/vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
Language: Python - Size: 133 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 8,403 - Forks: 600

k0c0r/improved-journey
Derek Simmons - Strategic Builder | Innovation Architect
Size: 1000 Bytes - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

Correia-jpv/fucking-awesome-bigdata
A curated list of awesome big data frameworks, resources and other awesomeness. With repository stars⭐ and forks🍴
Size: 655 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 11 - Forks: 1

AparajithKrishna/Mental-health-support
A simple and beginner-friendly web app built using HTML, CSS, and JavaScript to promote mental well-being.
Language: HTML - Size: 3.91 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

NationalSecurityAgency/datawave
DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
Language: Java - Size: 112 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 619 - Forks: 259

oxnr/awesome-bigdata
A curated list of awesome big data frameworks, ressources and other awesomeness.
Size: 843 KB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 13,690 - Forks: 2,575

byzer-org/byzer-lang
Byzer (former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.
Language: Scala - Size: 54.7 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 1,844 - Forks: 546

apache/hudi
Upserts, Deletes And Incremental Processing on Big Data.
Language: Java - Size: 1.84 GB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 5,858 - Forks: 2,423

nebulastream/nebulastream
Data Management for the Internet of Things
Language: C++ - Size: 518 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 33 - Forks: 3

ganweisoft/TOMs
TOMs is a fully open-source, high-performance, systematic, plugin-oriented, and scenario-agnostic general-purpose development framework.
Language: Batchfile - Size: 31.8 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 6 - Forks: 2

apache/amoro
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
Language: Java - Size: 68 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 996 - Forks: 342

brandmaier/semtree
Recursive Partitioning for Structural Equation Models
Language: R - Size: 30.4 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 18 - Forks: 13

databendlabs/databend
𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
Language: Rust - Size: 297 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 8,522 - Forks: 783

apache/incubator-livy
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Language: Scala - Size: 3.55 MB - Last synced at: about 23 hours ago - Pushed at: 25 days ago - Stars: 915 - Forks: 612

ganweisoft/Gateway
Gateway is a high-performance, centralized communication and scheduling module for various device plugins. It uniformly converts heterogeneous data into standardized models and delivers core functionalities such as real-time data storage, alarm triggering, linkage control, and task planning.
Language: C# - Size: 158 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2 - Forks: 1

apache/shardingsphere
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
Language: Java - Size: 633 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 20,328 - Forks: 6,827

transferia/transferia
Open Source Cloud Native Ingestion engine
Language: Go - Size: 21.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 132 - Forks: 15

JosingCai/data4test Fork of tongdun/data4test
Data4Test(盾测) 一个让测试变得更容易的系统,适用于功能测试,并发测试,异常测试,模糊测试,场景测试,长时间测试,国际化测试,大数据测试,性能测试等方面的测试工作。
Language: Go - Size: 124 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

apache/avro
Apache Avro is a data serialization system.
Language: Java - Size: 74.3 MB - Last synced at: about 23 hours ago - Pushed at: 2 days ago - Stars: 3,104 - Forks: 1,680

apache/avro-rs
Rust SDK for Apache Avro - a data serialization system.
Language: Rust - Size: 1.46 MB - Last synced at: about 23 hours ago - Pushed at: 7 days ago - Stars: 70 - Forks: 27

rapiddweller/rapiddweller-benerator-ce
BENERATOR is a leading software solution to generate, obfuscate, pseudonymize and migrate data for development, testing, and training purposes with a model-driven approach.
Language: Java - Size: 35.3 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 150 - Forks: 27

basedt/dms
open-source, free, and AI-powered intelligent data management system,supports AI and compatible with multiple databases including MySQL, Oracle, PostgreSQL, Doris, etc.
Language: Java - Size: 892 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 32 - Forks: 8

ffmmmss/batch
Batch compiler in Rust for faster execution than Windows cmd.exe. Contribute to enhance features and fix limitations. 🦀🚀
Language: Rust - Size: 81.1 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

taosdata/TDengine
High-performance, scalable time-series database designed for Industrial IoT (IIoT) scenarios
Language: C - Size: 637 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 24,063 - Forks: 4,930

gurre/s3streamer
Byte-stream objects from S3 using Golang io.Reader.
Language: Go - Size: 21.5 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

AbsaOSS/spline
Data Lineage Tracking And Visualization Solution
Language: Scala - Size: 8.47 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 635 - Forks: 159

Esri/aggregation-viewer-client-feature-layer
Sample JavaScript Aggregation Viewer using aggregation (lod) queries and rendering aggregation bins client side
Language: CSS - Size: 7.46 MB - Last synced at: 1 day ago - Pushed at: 7 days ago - Stars: 12 - Forks: 5

ictchenbo/SmartETL
SmartETL:一个简单、灵活、可配置、开箱即用的Python ETL框架,具有领域特色,拒绝重复造轮子!提供Wikidata / Wikipedia / GDELT等多种开源数据的处理流程; 支持txt/json/csv/excel等文件格式、MySQL/PostgreSQL/MongoDB/ClickHouse/ElasticSearch等数据库作为输入和输出; 提供大模型、Web API等多种处理算子
Language: Python - Size: 4.78 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 19 - Forks: 3

volcano-sh/volcano
A Cloud Native Batch System (Project under CNCF)
Language: Go - Size: 85.5 MB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 4,790 - Forks: 1,146

xuf-95/next-blog Fork of CaliCastle/cali.so
my person website
Language: TypeScript - Size: 29.6 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 4 - Forks: 0

matheus-asilva/Data-Science-Micromasters
My progress studing this micromasters from edX
Language: Jupyter Notebook - Size: 129 MB - Last synced at: 2 days ago - Pushed at: almost 7 years ago - Stars: 9 - Forks: 2

robinhood-jim/JavaFramework
Spring based Simple Java Framework, For rapid develop Spring boot or spring config application,Support and integrate with HDFS/Local/FTP/AWS/COS/OSS FileSystem Accessors.Custom ORM Solution,can merge JPA and mybatis or mybatisplus.Weka and spark mlib based dataming project
Language: Java - Size: 7.09 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 25 - Forks: 3

hi-primus/optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Language: Python - Size: 110 MB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 1,514 - Forks: 232

TAHIR0110/ThereForYou
ThereForYou: Your mental health ally. Kai, our AI assistant, offers compassionate support. Track your mood trends, find solace in a secure community, and access crisis resources swiftly. We're here to empower your journey towards improved well-being, leveraging technology for a brighter tomorrow.
Language: Python - Size: 658 MB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 89 - Forks: 91

scikit-hep/uproot5
ROOT I/O in pure Python and NumPy.
Language: Python - Size: 4.18 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 250 - Forks: 84

dotnet/spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Language: C# - Size: 4.87 MB - Last synced at: 7 days ago - Pushed at: about 2 months ago - Stars: 2,069 - Forks: 328

aitor-medrano/aitor-medrano.github.io
Materiales sobre apuntes y ejercicios creados sobre Big Data y NoSQL
Language: HTML - Size: 23.4 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 14 - Forks: 3

minio/sidekick
High Performance HTTP Sidecar Load Balancer
Language: Go - Size: 1.86 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 569 - Forks: 87

xiaomeng79/learning_notes
学习笔记
Language: Jupyter Notebook - Size: 10.3 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 25 - Forks: 10

DTStack/monaco-sql-languages
SQL languages for monaco-editor
Language: TypeScript - Size: 72.2 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 265 - Forks: 47

atengk/ops
运维相关的技术仓库
Language: Shell - Size: 21.7 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 5 - Forks: 4

Esri/aggregation-viewer-server-map-service
Sample JavaScript Aggregation Viewer using Map Service queries with an Aggregation Renderer and rendering images server side
Language: CSS - Size: 15 MB - Last synced at: 1 day ago - Pushed at: 11 days ago - Stars: 13 - Forks: 5

lovnishverma/Slidespptspdfs
Pdfs For Learning Python, DBMS, Big Data and Data Science AIML and much more...
Size: 57.1 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 11 - Forks: 0

GoogleCloudPlatform/data-analytics-golden-demo
An end to end demo of Google's Cloud data and analytic stack.
Language: Jupyter Notebook - Size: 14.8 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 253 - Forks: 86

brunocampos01/data-engineering
Language: Python - Size: 165 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 11 - Forks: 2

anovos/anovos
Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Language: Jupyter Notebook - Size: 88.5 MB - Last synced at: 6 days ago - Pushed at: about 2 years ago - Stars: 75 - Forks: 25

dimajix/flowman
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
Language: Scala - Size: 18.7 MB - Last synced at: 5 days ago - Pushed at: 20 days ago - Stars: 94 - Forks: 19

xiongshengxiao/CloudEon
CloudEon 使用 Kubernetes 安装和部署开源大数据组件,使开源大数据平台的容器化运行成为可能。这允许您减少对底层资源管理和维护的关注。
Language: FreeMarker - Size: 80.4 MB - Last synced at: 6 days ago - Pushed at: 13 days ago - Stars: 1 - Forks: 1

visualpython/visualpython
GUI-based Python code generator for data science, extension to Jupyter Lab, Jupyter Notebook and Google Colab.
Language: JavaScript - Size: 57.2 MB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 901 - Forks: 118

alercebroker/ztf_explorer
🌚 🔭 💻 ZTF Explorer for the ALeRCE broker
Language: Vue - Size: 31.6 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 6 - Forks: 0

MrXujiang/v6.dooring.public
可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Language: TypeScript - Size: 36 MB - Last synced at: 8 days ago - Pushed at: 7 months ago - Stars: 664 - Forks: 152

water8394/BigData-Interview
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Size: 6.59 MB - Last synced at: 13 days ago - Pushed at: almost 4 years ago - Stars: 1,622 - Forks: 447

DataExpert-io/data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
Language: Jupyter Notebook - Size: 55.6 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 33,971 - Forks: 6,515

Azure/azure-event-hubs-spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Language: Scala - Size: 19.6 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 237 - Forks: 178

ShifuML/shifu
An end-to-end machine learning and data mining framework on Hadoop
Language: Java - Size: 16.1 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 253 - Forks: 108

jamesmudd/jhdf
A pure Java HDF5 library
Language: Java - Size: 4.81 MB - Last synced at: 12 days ago - Pushed at: 14 days ago - Stars: 155 - Forks: 41

BaseMax/LaravelBigDataTest
PHP Laravel: Develop a test environment in Laravel with more than 20 Million user rows. (A project in blade laravel and another SPA in vue js infinite scroll)
Language: PHP - Size: 1.29 MB - Last synced at: 4 days ago - Pushed at: 15 days ago - Stars: 7 - Forks: 1

rdkmaster/jigsaw
Jigsaw七巧板 provides a set of web components based on Angular5/8/9+. The main purpose of Jigsaw is to help the application developers to construct complex & intensive interacting & user friendly web pages. Jigsaw is supporting the development of all applications of Big Data Product of ZTE.
Language: HTML - Size: 72 MB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 487 - Forks: 72

DTStack/dt-sql-parser
SQL Parsers for BigData, built with antlr4.
Language: TypeScript - Size: 52.7 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 342 - Forks: 101

tidb-incubator/TiBigData
TiDB connectors for Flink/Hive/Presto
Language: Java - Size: 3.04 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 219 - Forks: 57

chatnoir-eu/chatnoir-resiliparse
A robust web archive analytics toolkit
Language: Cython - Size: 1.89 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 111 - Forks: 15

mozilla/telemetry-batch-view 📦
A Scala framework to build derived datasets, aka batch views, of Telemetry data.
Language: Scala - Size: 12 MB - Last synced at: 4 days ago - Pushed at: about 3 years ago - Stars: 35 - Forks: 46

mozilla/telemetry-analysis-service 📦
Telemetry Analysis Service
Language: Python - Size: 4.5 MB - Last synced at: 8 days ago - Pushed at: over 5 years ago - Stars: 37 - Forks: 20

leesf/hudi-resources
汇总Apache Hudi相关资料
Size: 23.8 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 554 - Forks: 160

3uno1a/Weather-Based_WindowController Fork of lullu303/SmartWindow
Auto window system based on weather conditions (rain, dust, temperature) with mobile app & voice control
Language: Jupyter Notebook - Size: 190 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

anqorithm/RealTime-StockStream
RealTime StockStream is a streamlined, simulation system for processing live stock market data. It uses Apache Kafka for data input, Apache Spark for data handling, and Apache Cassandra for data storage, making it a powerful yet easy-to-use tool for financial data analysis
Language: Python - Size: 5.36 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 27 - Forks: 3

mfuu/ngx-virtual-sortable
A virtual scrolling list component that can be sorted by dragging
Language: TypeScript - Size: 674 KB - Last synced at: 9 days ago - Pushed at: 18 days ago - Stars: 2 - Forks: 0

mfuu/vue3-virtual-sortable
A virtual scrolling list component that can be sorted by dragging, for vue3
Language: TypeScript - Size: 1.94 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 46 - Forks: 11
