An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: bigdata

reductstore/reductstore

High Performance Storage and Streaming Solution for Data Acquisition Systems

Language: Rust - Size: 2.62 MB - Last synced at: about 9 hours ago - Pushed at: about 11 hours ago - Stars: 221 - Forks: 15

jotstolu/Azure-Data-Engineering-End-to-End-Project---NYC-taxi-dataset

An end‑to‑end data engineering pipeline for NYC Green Taxi trip records, built on Microsoft Azure. This project ingests Jan–Dec 2024 Parquet files from the NYC Taxi API into a Bronze Delta Lake layer, cleans and enriches the data in a Silver layer with PySpark on Azure Databricks, then saves the transformed data to the Gold layer in delta format

Size: 1.69 MB - Last synced at: about 10 hours ago - Pushed at: about 11 hours ago - Stars: 0 - Forks: 0

CurvineIO/curvine

High performance distributed cache system. Built by Rust.

Language: Rust - Size: 729 KB - Last synced at: about 16 hours ago - Pushed at: about 17 hours ago - Stars: 42 - Forks: 11

fahadkalil/bigdata_docker

Multi Container Docker para Pipeline em Big Data

Language: Dockerfile - Size: 13.2 MB - Last synced at: about 18 hours ago - Pushed at: about 20 hours ago - Stars: 1 - Forks: 0

apconw/sanic-web

一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen2.5等大模型 基于 Dify 、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目,采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答,具备处理 CSV 文件 📂 表格问答的能力。同时,能方便对接第三方开源 RAG 系统 检索系统 🌐等,以支持广泛的通用知识问答。

Language: TypeScript - Size: 141 MB - Last synced at: about 18 hours ago - Pushed at: about 19 hours ago - Stars: 837 - Forks: 160

rustfs/rustfs

🚀 High-performance distributed object storage for MinIO alternative.

Language: Rust - Size: 6.38 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,658 - Forks: 71

kartzum/d-space

Algorithms, BigData, Apache Spark...

Language: Java - Size: 922 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

apache/celeborn

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.

Language: Java - Size: 31.4 MB - Last synced at: about 23 hours ago - Pushed at: 1 day ago - Stars: 957 - Forks: 411

sukbeta/sukbeta.github.io

舒克贝塔

Language: HTML - Size: 5.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 1

dromara/dataCompare

big data comparison and data profiling platform: low code,data comparison and data profiling

Language: Java - Size: 10.9 MB - Last synced at: about 13 hours ago - Pushed at: about 1 year ago - Stars: 269 - Forks: 62

open-metadata/openmetadata-site

Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

Language: TypeScript - Size: 55.6 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 14 - Forks: 11

Fradhyle/Voo-ong

인공지능을 활용한 개인화 영화 추천 시스템

Language: Jupyter Notebook - Size: 57.7 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

juicedata/juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

Language: Go - Size: 62.5 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 11,837 - Forks: 1,052

zyra121/advertising-sales-prediction

This repository showcases a linear regression analysis using the Advertising dataset, demonstrating both simple and multiple regression techniques in Python. It also features a custom implementation of Gradient Descent for a deeper understanding of the concepts. 🐱💻📊

Language: Python - Size: 1.33 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

sderosiaux/every-single-day-i-tldr

A daily digest of the articles or videos I've found interesting, that I want to share with you.

Size: 12.4 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 320 - Forks: 21

mjakubowski84/parquet4s

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

Language: Scala - Size: 2.34 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 291 - Forks: 67

andrewsuadnya/YouTube-Live-Chat-Sentiment-Analysis

A near real-time sentiment analysis system for YouTube live chat using a big data stack. Built with Kafka, Spark Structured Streaming, Elasticsearch, Kibana, Flask, and React.js to collect, process, and visualize live messages.

Language: Python - Size: 23.1 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2 - Forks: 0

jnidzwetzki/bboxdb

BBoxDB is a scalable, highly available, and distributed data store for multi-dimensional big data. The software supports operations like multi-dimensional range queries and spatial joins. In addition, data streams are supported.

Language: Java - Size: 32.8 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 56 - Forks: 10

soro8/Recommender-System-MovieLens

# Recommender-System-MovieLensThis repository contains a machine learning project that builds a movie recommendation system using Content-Based Filtering and Collaborative Filtering approaches. 🛠️ It aims to help users navigate large film catalogs and discover new titles they are likely to enjoy. 📽️

Language: Jupyter Notebook - Size: 1.39 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

legend-exp/legend-pydataobj

LEGEND Python Data Objects

Language: Python - Size: 1.35 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 11

zhaoyachao/zdh_magic_mirror

zdh系列-基于java的经营风控引擎

Language: Java - Size: 1.59 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 12 - Forks: 5

zhaoyachao/zdh_web

大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批流,私域营销等模块

Language: Java - Size: 141 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 519 - Forks: 182

binghe001/binghe001.github.io

📚 本静态博客是作者冰河多年从事多年互联网大厂开发、架构的学习历程技术汇总,旨在为大家提供一个清晰详细的学习教程,侧重点更倾向编写Java核心内容、底层原理、架构知识、渗透技术。如果本仓库能为您提供帮助,请给予支持(关注、点赞、分享)!

Language: HTML - Size: 1.85 GB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 34 - Forks: 3

binghe001/BingheGuide

🔥🔥🔥 📚 本代码库是作者冰河多年从事互联网大厂开发、架构的学习历程技术汇总,旨在为大家提供一个清晰详细的学习教程,侧重点更倾向编写Java核心内容、底层原理、架构知识、渗透技术。如果本仓库能为您提供帮助,请给予支持(关注、点赞、分享)!

Language: Shell - Size: 692 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 356 - Forks: 147

HariSekhon/Knowledge-Base

Large Tech Knowledge Base from 20 years in DevOps, Linux, Cloud, Big Data, AWS, GCP etc - gradually porting my large private knowledge base to public

Language: Shell - Size: 183 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 188 - Forks: 36

PriyankaBhatta/60DaysofLearning2025

Welcome to my #60DaysOfLearning2025 challenge! This repository is a personal learning log where I document my daily progress as I explore and upskill in various areas of technology.

Size: 90.8 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

grouzen/zio-apache-parquet

Scala ZIO-powered Apache Parquet library

Language: Scala - Size: 387 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 25 - Forks: 4

ramarimoo/insert-tools

Simple and fast Python toolset for bulk data insertion into databases and CSVs. Ideal for ETL pipelines and data engineering tasks.

Language: Python - Size: 31.3 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

apache/airavata

A general purpose Distributed Systems Framework

Language: Java - Size: 168 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 122 - Forks: 135

Yimyaa/AI-ML-Cheatsheets

All standford Cheatsheets: Artificial Intelligence, Transformers, LLMs, Deep Learning, Machine Learning, Probabilities, Statistics, Algebra and Calculus.

Size: 50.3 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 0

arvados/arvados

An open source platform for managing and analyzing biomedical big data

Language: Go - Size: 75.2 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 404 - Forks: 123

100-rab/AMO

[RSS 2025] AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control

Language: Python - Size: 44.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

vaexio/vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

Language: Python - Size: 133 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 8,403 - Forks: 600

k0c0r/improved-journey

Derek Simmons - Strategic Builder | Innovation Architect

Size: 1000 Bytes - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

Correia-jpv/fucking-awesome-bigdata

A curated list of awesome big data frameworks, resources and other awesomeness. With repository stars⭐ and forks🍴

Size: 655 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 11 - Forks: 1

AparajithKrishna/Mental-health-support

A simple and beginner-friendly web app built using HTML, CSS, and JavaScript to promote mental well-being.

Language: HTML - Size: 3.91 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

NationalSecurityAgency/datawave

DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.

Language: Java - Size: 112 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 619 - Forks: 259

oxnr/awesome-bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

Size: 843 KB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 13,690 - Forks: 2,575

byzer-org/byzer-lang

Byzer (former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.

Language: Scala - Size: 54.7 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 1,844 - Forks: 546

apache/hudi

Upserts, Deletes And Incremental Processing on Big Data.

Language: Java - Size: 1.84 GB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 5,858 - Forks: 2,423

nebulastream/nebulastream

Data Management for the Internet of Things

Language: C++ - Size: 518 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 33 - Forks: 3

ganweisoft/TOMs

TOMs is a fully open-source, high-performance, systematic, plugin-oriented, and scenario-agnostic general-purpose development framework.

Language: Batchfile - Size: 31.8 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 6 - Forks: 2

apache/amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.

Language: Java - Size: 68 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 996 - Forks: 342

brandmaier/semtree

Recursive Partitioning for Structural Equation Models

Language: R - Size: 30.4 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 18 - Forks: 13

databendlabs/databend

𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com

Language: Rust - Size: 297 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 8,522 - Forks: 783

apache/incubator-livy

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

Language: Scala - Size: 3.55 MB - Last synced at: about 23 hours ago - Pushed at: 25 days ago - Stars: 915 - Forks: 612

ganweisoft/Gateway

Gateway is a high-performance, centralized communication and scheduling module for various device plugins. It uniformly converts heterogeneous data into standardized models and delivers core functionalities such as real-time data storage, alarm triggering, linkage control, and task planning.

Language: C# - Size: 158 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2 - Forks: 1

apache/shardingsphere

Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.

Language: Java - Size: 633 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 20,328 - Forks: 6,827

transferia/transferia

Open Source Cloud Native Ingestion engine

Language: Go - Size: 21.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 132 - Forks: 15

JosingCai/data4test Fork of tongdun/data4test

Data4Test(盾测) 一个让测试变得更容易的系统,适用于功能测试,并发测试,异常测试,模糊测试,场景测试,长时间测试,国际化测试,大数据测试,性能测试等方面的测试工作。

Language: Go - Size: 124 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

apache/avro

Apache Avro is a data serialization system.

Language: Java - Size: 74.3 MB - Last synced at: about 23 hours ago - Pushed at: 2 days ago - Stars: 3,104 - Forks: 1,680

apache/avro-rs

Rust SDK for Apache Avro - a data serialization system.

Language: Rust - Size: 1.46 MB - Last synced at: about 23 hours ago - Pushed at: 7 days ago - Stars: 70 - Forks: 27

rapiddweller/rapiddweller-benerator-ce

BENERATOR is a leading software solution to generate, obfuscate, pseudonymize and migrate data for development, testing, and training purposes with a model-driven approach.

Language: Java - Size: 35.3 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 150 - Forks: 27

basedt/dms

open-source, free, and AI-powered intelligent data management system,supports AI and compatible with multiple databases including MySQL, Oracle, PostgreSQL, Doris, etc.

Language: Java - Size: 892 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 32 - Forks: 8

ffmmmss/batch

Batch compiler in Rust for faster execution than Windows cmd.exe. Contribute to enhance features and fix limitations. 🦀🚀

Language: Rust - Size: 81.1 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

taosdata/TDengine

High-performance, scalable time-series database designed for Industrial IoT (IIoT) scenarios

Language: C - Size: 637 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 24,063 - Forks: 4,930

gurre/s3streamer

Byte-stream objects from S3 using Golang io.Reader.

Language: Go - Size: 21.5 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

AbsaOSS/spline

Data Lineage Tracking And Visualization Solution

Language: Scala - Size: 8.47 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 635 - Forks: 159

Esri/aggregation-viewer-client-feature-layer

Sample JavaScript Aggregation Viewer using aggregation (lod) queries and rendering aggregation bins client side

Language: CSS - Size: 7.46 MB - Last synced at: 1 day ago - Pushed at: 7 days ago - Stars: 12 - Forks: 5

ictchenbo/SmartETL

SmartETL:一个简单、灵活、可配置、开箱即用的Python ETL框架,具有领域特色,拒绝重复造轮子!提供Wikidata / Wikipedia / GDELT等多种开源数据的处理流程; 支持txt/json/csv/excel等文件格式、MySQL/PostgreSQL/MongoDB/ClickHouse/ElasticSearch等数据库作为输入和输出; 提供大模型、Web API等多种处理算子

Language: Python - Size: 4.78 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 19 - Forks: 3

volcano-sh/volcano

A Cloud Native Batch System (Project under CNCF)

Language: Go - Size: 85.5 MB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 4,790 - Forks: 1,146

xuf-95/next-blog Fork of CaliCastle/cali.so

my person website

Language: TypeScript - Size: 29.6 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 4 - Forks: 0

matheus-asilva/Data-Science-Micromasters

My progress studing this micromasters from edX

Language: Jupyter Notebook - Size: 129 MB - Last synced at: 2 days ago - Pushed at: almost 7 years ago - Stars: 9 - Forks: 2

robinhood-jim/JavaFramework

Spring based Simple Java Framework, For rapid develop Spring boot or spring config application,Support and integrate with HDFS/Local/FTP/AWS/COS/OSS FileSystem Accessors.Custom ORM Solution,can merge JPA and mybatis or mybatisplus.Weka and spark mlib based dataming project

Language: Java - Size: 7.09 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 25 - Forks: 3

hi-primus/optimus

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Language: Python - Size: 110 MB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 1,514 - Forks: 232

TAHIR0110/ThereForYou

ThereForYou: Your mental health ally. Kai, our AI assistant, offers compassionate support. Track your mood trends, find solace in a secure community, and access crisis resources swiftly. We're here to empower your journey towards improved well-being, leveraging technology for a brighter tomorrow.

Language: Python - Size: 658 MB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 89 - Forks: 91

scikit-hep/uproot5

ROOT I/O in pure Python and NumPy.

Language: Python - Size: 4.18 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 250 - Forks: 84

dotnet/spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Language: C# - Size: 4.87 MB - Last synced at: 7 days ago - Pushed at: about 2 months ago - Stars: 2,069 - Forks: 328

aitor-medrano/aitor-medrano.github.io

Materiales sobre apuntes y ejercicios creados sobre Big Data y NoSQL

Language: HTML - Size: 23.4 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 14 - Forks: 3

minio/sidekick

High Performance HTTP Sidecar Load Balancer

Language: Go - Size: 1.86 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 569 - Forks: 87

xiaomeng79/learning_notes

学习笔记

Language: Jupyter Notebook - Size: 10.3 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 25 - Forks: 10

DTStack/monaco-sql-languages

SQL languages for monaco-editor

Language: TypeScript - Size: 72.2 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 265 - Forks: 47

atengk/ops

运维相关的技术仓库

Language: Shell - Size: 21.7 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 5 - Forks: 4

Esri/aggregation-viewer-server-map-service

Sample JavaScript Aggregation Viewer using Map Service queries with an Aggregation Renderer and rendering images server side

Language: CSS - Size: 15 MB - Last synced at: 1 day ago - Pushed at: 11 days ago - Stars: 13 - Forks: 5

lovnishverma/Slidespptspdfs

Pdfs For Learning Python, DBMS, Big Data and Data Science AIML and much more...

Size: 57.1 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 11 - Forks: 0

GoogleCloudPlatform/data-analytics-golden-demo

An end to end demo of Google's Cloud data and analytic stack.

Language: Jupyter Notebook - Size: 14.8 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 253 - Forks: 86

brunocampos01/data-engineering

Language: Python - Size: 165 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 11 - Forks: 2

anovos/anovos

Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark

Language: Jupyter Notebook - Size: 88.5 MB - Last synced at: 6 days ago - Pushed at: about 2 years ago - Stars: 75 - Forks: 25

dimajix/flowman

Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.

Language: Scala - Size: 18.7 MB - Last synced at: 5 days ago - Pushed at: 20 days ago - Stars: 94 - Forks: 19

xiongshengxiao/CloudEon

CloudEon 使用 Kubernetes 安装和部署开源大数据组件,使开源大数据平台的容器化运行成为可能。这允许您减少对底层资源管理和维护的关注。

Language: FreeMarker - Size: 80.4 MB - Last synced at: 6 days ago - Pushed at: 13 days ago - Stars: 1 - Forks: 1

visualpython/visualpython

GUI-based Python code generator for data science, extension to Jupyter Lab, Jupyter Notebook and Google Colab.

Language: JavaScript - Size: 57.2 MB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 901 - Forks: 118

alercebroker/ztf_explorer

🌚 🔭 💻 ZTF Explorer for the ALeRCE broker

Language: Vue - Size: 31.6 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 6 - Forks: 0

MrXujiang/v6.dooring.public

可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.

Language: TypeScript - Size: 36 MB - Last synced at: 8 days ago - Pushed at: 7 months ago - Stars: 664 - Forks: 152

water8394/BigData-Interview

:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Size: 6.59 MB - Last synced at: 13 days ago - Pushed at: almost 4 years ago - Stars: 1,622 - Forks: 447

DataExpert-io/data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

Language: Jupyter Notebook - Size: 55.6 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 33,971 - Forks: 6,515

Azure/azure-event-hubs-spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Language: Scala - Size: 19.6 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 237 - Forks: 178

ShifuML/shifu

An end-to-end machine learning and data mining framework on Hadoop

Language: Java - Size: 16.1 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 253 - Forks: 108

jamesmudd/jhdf

A pure Java HDF5 library

Language: Java - Size: 4.81 MB - Last synced at: 12 days ago - Pushed at: 14 days ago - Stars: 155 - Forks: 41

BaseMax/LaravelBigDataTest

PHP Laravel: Develop a test environment in Laravel with more than 20 Million user rows. (A project in blade laravel and another SPA in vue js infinite scroll)

Language: PHP - Size: 1.29 MB - Last synced at: 4 days ago - Pushed at: 15 days ago - Stars: 7 - Forks: 1

rdkmaster/jigsaw

Jigsaw七巧板 provides a set of web components based on Angular5/8/9+. The main purpose of Jigsaw is to help the application developers to construct complex & intensive interacting & user friendly web pages. Jigsaw is supporting the development of all applications of Big Data Product of ZTE.

Language: HTML - Size: 72 MB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 487 - Forks: 72

DTStack/dt-sql-parser

SQL Parsers for BigData, built with antlr4.

Language: TypeScript - Size: 52.7 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 342 - Forks: 101

tidb-incubator/TiBigData

TiDB connectors for Flink/Hive/Presto

Language: Java - Size: 3.04 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 219 - Forks: 57

chatnoir-eu/chatnoir-resiliparse

A robust web archive analytics toolkit

Language: Cython - Size: 1.89 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 111 - Forks: 15

mozilla/telemetry-batch-view 📦

A Scala framework to build derived datasets, aka batch views, of Telemetry data.

Language: Scala - Size: 12 MB - Last synced at: 4 days ago - Pushed at: about 3 years ago - Stars: 35 - Forks: 46

mozilla/telemetry-analysis-service 📦

Telemetry Analysis Service

Language: Python - Size: 4.5 MB - Last synced at: 8 days ago - Pushed at: over 5 years ago - Stars: 37 - Forks: 20

leesf/hudi-resources

汇总Apache Hudi相关资料

Size: 23.8 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 554 - Forks: 160

3uno1a/Weather-Based_WindowController Fork of lullu303/SmartWindow

Auto window system based on weather conditions (rain, dust, temperature) with mobile app & voice control

Language: Jupyter Notebook - Size: 190 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

anqorithm/RealTime-StockStream

RealTime StockStream is a streamlined, simulation system for processing live stock market data. It uses Apache Kafka for data input, Apache Spark for data handling, and Apache Cassandra for data storage, making it a powerful yet easy-to-use tool for financial data analysis

Language: Python - Size: 5.36 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 27 - Forks: 3

mfuu/ngx-virtual-sortable

A virtual scrolling list component that can be sorted by dragging

Language: TypeScript - Size: 674 KB - Last synced at: 9 days ago - Pushed at: 18 days ago - Stars: 2 - Forks: 0

mfuu/vue3-virtual-sortable

A virtual scrolling list component that can be sorted by dragging, for vue3

Language: TypeScript - Size: 1.94 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 46 - Forks: 11