GitHub topics: bigdata
andrewsuadnya/YouTube-Live-Chat-Sentiment-Analysis
📊 Near Real-time YouTube Live Chat Sentiment Analysis using Big Data Stack — Built with Kafka, Spark Structured Streaming, Elasticsearch, Kibana, Flask, and React.js to collect, process, and analyze live chat.
Language: Python - Size: 23 MB - Last synced at: about 8 hours ago - Pushed at: about 8 hours ago - Stars: 2 - Forks: 0

aliyun/aliyun-odps-java-sdk
ODPS SDK for Java Developers
Language: Java - Size: 29.3 MB - Last synced at: about 8 hours ago - Pushed at: about 9 hours ago - Stars: 90 - Forks: 50

basedt/dms
open-source, free, and AI-powered intelligent data management system,supports AI and compatible with multiple databases including MySQL, Oracle, PostgreSQL, Doris, etc.
Language: Java - Size: 869 KB - Last synced at: about 12 hours ago - Pushed at: about 12 hours ago - Stars: 31 - Forks: 8

ramarimoo/insert-tools
Simple and fast Python toolset for bulk data insertion into databases and CSVs. Ideal for ETL pipelines and data engineering tasks.
Language: Python - Size: 31.3 KB - Last synced at: about 12 hours ago - Pushed at: about 13 hours ago - Stars: 0 - Forks: 0

adamalton/django-mass-migration
Django app for managing long-running data operations on large and/or schemaless databases
Language: Python - Size: 131 KB - Last synced at: about 23 hours ago - Pushed at: about 24 hours ago - Stars: 4 - Forks: 2

jfsanchez/bigdata
Language: Jupyter Notebook - Size: 30.1 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

ictchenbo/SmartETL
SmartETL:一个简单、灵活、可配置、开箱即用的Python ETL框架,具有领域特色,拒绝重复造轮子!提供Wikidata / Wikipedia / GDELT等多种开源数据的处理流程; 支持txt/json/csv/excel等文件格式、MySQL/PostgreSQL/MongoDB/ClickHouse/ElasticSearch等数据库作为输入和输出; 提供大模型、Web API等多种处理算子
Language: Python - Size: 4.74 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 17 - Forks: 3

125ade/AIS_Data_Analysis
Project on AIS data analysis to extract valuable maritime insights and improve vessel monitoring and navigation
Language: Python - Size: 271 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

100-rab/AMO
[RSS 2025] AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control
Language: Python - Size: 44.5 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

juicedata/juicefs
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
Language: Go - Size: 62.2 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 11,702 - Forks: 1,038

open-metadata/openmetadata-site
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
Language: TypeScript - Size: 54.6 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 14 - Forks: 11

k0c0r/improved-journey
Derek Simmons - Strategic Builder | Innovation Architect
Size: 1000 Bytes - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

Yimyaa/AI-ML-Cheatsheets
All standford Cheatsheets: Artificial Intelligence, Transformers, LLMs, Deep Learning, Machine Learning, Probabilities, Statistics, Algebra and Calculus.
Size: 50.3 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2 - Forks: 0

GoogleCloudPlatform/data-analytics-golden-demo
An end to end demo of Google's Cloud data and analytic stack.
Language: Jupyter Notebook - Size: 11.7 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 253 - Forks: 84

apache/celeborn
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
Language: Java - Size: 31.3 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 953 - Forks: 392

AparajithKrishna/Mental-health-support
A simple and beginner-friendly web app built using HTML, CSS, and JavaScript to promote mental well-being.
Language: HTML - Size: 3.91 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

timebusker/timebusker.github.io
timebusker.github.io
Language: HTML - Size: 235 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

NationalSecurityAgency/datawave
DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
Language: Java - Size: 111 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 615 - Forks: 259

microsoft/Mobius
C# and F# language binding and extensions to Apache Spark
Language: C# - Size: 6.44 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 939 - Forks: 211

reductstore/reductstore
High Performance Storage and Streaming Solution for Data Acquisition Systems
Language: Rust - Size: 2.63 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 213 - Forks: 13

databendlabs/databend
𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
Language: Rust - Size: 295 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 8,473 - Forks: 779

apache/hudi
Upserts, Deletes And Incremental Processing on Big Data.
Language: Java - Size: 1.82 GB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 5,833 - Forks: 2,411

apparebit/shantay
Trying to make sense of the EU's DSA Transparency DB
Language: HTML - Size: 5.78 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

jotstolu/Azure-Data-Engineering-End-to-End-Project---NYC-taxi-dataset
An end‑to‑end data engineering pipeline for NYC Green Taxi trip records, built on Microsoft Azure. This project ingests Jan–Dec 2024 Parquet files from the NYC Taxi API into a Bronze Delta Lake layer, cleans and enriches the data in a Silver layer with PySpark on Azure Databricks, then saves the transformed data to the Gold layer in delta format
Size: 881 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

mfuu/vue3-virtual-sortable
A virtual scrolling list component that can be sorted by dragging, for vue3
Language: TypeScript - Size: 1.89 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 46 - Forks: 11

DragonKingpin/Hydra
Hydra九头龙,面向PB级别知识库取数、情报系统、数据平台、大规模控制调度系统。建设云计算资源管理、任务/服务统一调度、数仓、微服务化、中台基建系统化能力。——以实现大规模分布式爬虫搜索引擎为例。
Language: Java - Size: 20 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 299 - Forks: 21

ganweisoft/TOMs
TOMs is a fully open-source, systematic, plugin-based, high-performance, out-of-the-box, and production-ready development framework for IoT industry applications.
Size: 34.2 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

apache/shardingsphere
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
Language: Java - Size: 634 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 20,284 - Forks: 6,818

taosdata/TDengine
High-performance, scalable time-series database designed for Industrial IoT (IIoT) scenarios
Language: C - Size: 631 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 23,976 - Forks: 4,920

transferia/transferia
Open Source Cloud Native Ingestion engine
Language: Go - Size: 22.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 122 - Forks: 14

Correia-jpv/fucking-awesome-bigdata
A curated list of awesome big data frameworks, resources and other awesomeness. With repository stars⭐ and forks🍴
Size: 655 KB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 10 - Forks: 1

Srihariharasudhan-Balakannan/Trends-in-Data-jobs
The Trends in Data Jobs project is a web scraping and data visualization tool designed to track and analyze trends in data-related job postings.
Language: Jupyter Notebook - Size: 40.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 1

Netflix/genie
Distributed Big Data Orchestration Service
Language: Java - Size: 206 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,734 - Forks: 372

viniciusvdias/pdm
DCC/UFLA course "Big-Data: Massive Data Processing"
Language: Jupyter Notebook - Size: 209 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 3

oxnr/awesome-bigdata
A curated list of awesome big data frameworks, ressources and other awesomeness.
Size: 843 KB - Last synced at: 4 days ago - Pushed at: 4 months ago - Stars: 13,645 - Forks: 2,572

delhoume/BigMars
Make your own terapixel interactive image of the surface of Mars
Language: C - Size: 3.37 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2 - Forks: 0

sderosiaux/every-single-day-i-tldr
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Size: 6.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 320 - Forks: 21

apache/amoro
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
Language: Java - Size: 67.9 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 978 - Forks: 335

xuf-95/next-blog Fork of CaliCastle/cali.so
my person website
Language: TypeScript - Size: 29.3 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4 - Forks: 0

apconw/sanic-web
一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen2.5等大模型 基于 Dify 、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目,采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答,具备处理 CSV 文件 📂 表格问答的能力。同时,能方便对接第三方开源 RAG 系统 检索系统 🌐等,以支持广泛的通用知识问答。
Language: JavaScript - Size: 145 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 739 - Forks: 139

xavigs/breaks-analyzer
Web scraper that extracts all daily tennis matches, and analyse them to predict the probability in the "First Set Player To Break Serve" market.
Language: Python - Size: 21 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

zyra121/advertising-sales-prediction
This repository showcases a linear regression analysis using the Advertising dataset, demonstrating both simple and multiple regression techniques in Python. It also features a custom implementation of Gradient Descent for a deeper understanding of the concepts. 🐱💻📊
Language: Python - Size: 1.33 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

apache/airavata
A general purpose Distributed Systems Framework
Language: Java - Size: 167 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 120 - Forks: 132

HariSekhon/Knowledge-Base
Large Tech Knowledge Base from 20 years in DevOps, Linux, Cloud, Big Data, AWS, GCP etc - gradually porting my large private knowledge base to public
Language: Shell - Size: 183 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 174 - Forks: 31

zeromicro/cds
Data syncing in golang for ClickHouse.
Language: Go - Size: 7.01 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 970 - Forks: 140

mfuu/vue-virtual-sortable
A virtual scrolling list component that can be sorted by dragging
Language: TypeScript - Size: 3.19 MB - Last synced at: 2 days ago - Pushed at: 11 days ago - Stars: 42 - Forks: 12

Azure/azure-event-hubs-spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Language: Scala - Size: 19.6 MB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 236 - Forks: 178

apache/avro
Apache Avro is a data serialization system.
Language: Java - Size: 74.5 MB - Last synced at: 6 days ago - Pushed at: 11 days ago - Stars: 3,082 - Forks: 1,675

arvados/arvados
An open source platform for managing and analyzing biomedical big data
Language: Go - Size: 75.5 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 403 - Forks: 122

apache/avro-rs
Rust SDK for Apache Avro - a data serialization system.
Language: Rust - Size: 1.5 MB - Last synced at: 6 days ago - Pushed at: 8 days ago - Stars: 62 - Forks: 27

devinrsmith/deephaven-parquet-viewer
A browser-based Parquet file viewer
Language: Shell - Size: 194 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 46 - Forks: 3

moderatedan/DataBrokerOptOut
Data Broker Opt Out is a Python script that provides a convenient way to access opt-out pages of various data brokers on the web. Data brokers are companies that collect, analyze, and sell personal information, and opting out from their services can enhance your privacy.
Size: 167 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 2 - Forks: 0

MoRan1607/BigDataGuide
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Size: 154 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 2,958 - Forks: 903

DTStack/monaco-sql-languages
SQL languages for monaco-editor
Language: TypeScript - Size: 72.2 MB - Last synced at: 4 days ago - Pushed at: 9 days ago - Stars: 263 - Forks: 49

gilberto-009199/bigdata
Workspaces de BigData:
Language: Java - Size: 60.4 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

lovnishverma/Slidespptspdfs
Pdfs For Learning Python, DBMS, Big Data and Data Science AIML and much more...
Size: 51.5 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 9 - Forks: 0

AbsaOSS/spline
Data Lineage Tracking And Visualization Solution
Language: Scala - Size: 8.38 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 627 - Forks: 159

visualpython/visualpython
GUI-based Python code generator for data science, extension to Jupyter Lab, Jupyter Notebook and Google Colab.
Language: JavaScript - Size: 57.2 MB - Last synced at: 5 days ago - Pushed at: 12 months ago - Stars: 895 - Forks: 118

martymac/fpart
Sort files and pack them into partitions
Language: C - Size: 1.31 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 262 - Forks: 43

DTStack/dt-sql-parser
SQL Parsers for BigData, built with antlr4.
Language: TypeScript - Size: 51 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 343 - Forks: 101

MrXujiang/v6.dooring.public
可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Language: TypeScript - Size: 36 MB - Last synced at: 2 days ago - Pushed at: 6 months ago - Stars: 659 - Forks: 152

raystack/meteor
Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog.
Language: Go - Size: 14.5 MB - Last synced at: about 13 hours ago - Pushed at: 8 months ago - Stars: 208 - Forks: 42

canimus/cuallee
Possibly the fastest DataFrame-agnostic quality check library in town.
Language: Python - Size: 2.29 MB - Last synced at: 10 days ago - Pushed at: 12 days ago - Stars: 191 - Forks: 20

CyprienKelma/Projet-M1
Entreprise-grade, scalable and resilient architecture for data management and processing.
Language: Jupyter Notebook - Size: 43.9 MB - Last synced at: 3 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

legend-exp/legend-pydataobj
LEGEND Python Data Objects
Language: Python - Size: 1.22 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 11

rustfs/rustfs
🚀 High-performance distributed object storage for MinIO alternative.
Size: 1.64 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 104 - Forks: 5

hi-primus/optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Language: Python - Size: 110 MB - Last synced at: 2 days ago - Pushed at: 6 months ago - Stars: 1,512 - Forks: 232

xiongshengxiao/CloudEon
CloudEon 使用 Kubernetes 安装和部署开源大数据组件,使开源大数据平台的容器化运行成为可能。这允许您减少对底层资源管理和维护的关注。
Language: FreeMarker - Size: 80.6 MB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

volcano-sh/volcano
A Cloud Native Batch System (Project under CNCF)
Language: Go - Size: 84 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 4,705 - Forks: 1,087

alercebroker/ztf_explorer
🌚 🔭 💻 ZTF Explorer for the ALeRCE broker
Language: Vue - Size: 31.6 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 6 - Forks: 0

mfuu/ngx-virtual-sortable
A virtual scrolling list component that can be sorted by dragging
Language: TypeScript - Size: 526 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 2 - Forks: 0

mfuu/react-virtual-sortable
A virtual scrolling list component that can be sorted by dragging
Language: TypeScript - Size: 1.87 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 5 - Forks: 0

divithraju/divith-aju-Hadoop-Pyspark-pipeline
This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.
Language: Python - Size: 4.88 KB - Last synced at: 10 days ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

mjakubowski84/parquet4s
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Language: Scala - Size: 2.33 MB - Last synced at: 7 days ago - Pushed at: 20 days ago - Stars: 291 - Forks: 67

scikit-hep/uproot5
ROOT I/O in pure Python and NumPy.
Language: Python - Size: 4.24 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 249 - Forks: 84

gearpump/gearpump
Lightweight real-time big data streaming engine over Akka
Language: Scala - Size: 26.2 MB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 761 - Forks: 152

apache/mnemonic 📦
Apache Mnemonic - A non-volatile hybrid memory storage oriented library
Language: Java - Size: 3.09 MB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 118 - Forks: 63

DataExpert-io/data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
Language: Jupyter Notebook - Size: 55.6 MB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 28,430 - Forks: 5,750

vaexio/vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
Language: Python - Size: 133 MB - Last synced at: 11 days ago - Pushed at: 8 months ago - Stars: 8,387 - Forks: 598

dotnet/spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Language: C# - Size: 4.87 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 2,058 - Forks: 327

jfsanchez/jfsanchez.github.io
Web con soporte para resto de proyectos
Language: JavaScript - Size: 7.6 MB - Last synced at: 13 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

apache/incubator-livy
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Language: Scala - Size: 3.54 MB - Last synced at: 6 days ago - Pushed at: 25 days ago - Stars: 913 - Forks: 610

jnidzwetzki/bboxdb
BBoxDB is a scalable, highly available, and distributed data store for multi-dimensional big data. The software supports operations like multi-dimensional range queries and spatial joins. In addition, data streams are supported.
Language: Java - Size: 32.8 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 55 - Forks: 9

kubernetes-retired/kube-batch 📦
A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC
Language: Go - Size: 44.1 MB - Last synced at: 1 day ago - Pushed at: about 2 years ago - Stars: 1,090 - Forks: 264

atlas555/atlas555.github.io
a personal blog
Language: HTML - Size: 28.6 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

dromara/dataCompare
big data comparison and data profiling platform: low code,data comparison and data profiling
Language: Java - Size: 10.9 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 265 - Forks: 62

rdkmaster/jigsaw
Jigsaw七巧板 provides a set of web components based on Angular5/8/9+. The main purpose of Jigsaw is to help the application developers to construct complex & intensive interacting & user friendly web pages. Jigsaw is supporting the development of all applications of Big Data Product of ZTE.
Language: HTML - Size: 72 MB - Last synced at: 10 days ago - Pushed at: about 2 months ago - Stars: 486 - Forks: 72

benedekh/bigdata-projects
Student projects in Big Data field.
Language: Java - Size: 198 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 19 - Forks: 12

keanteng/wqd7009
🗃️ Data & Working Files for Big Data Pipeline on Google Cloud
Language: Jupyter Notebook - Size: 12.8 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

atengk/ops
运维相关的技术仓库
Language: Shell - Size: 21.7 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 3 - Forks: 3

iGaoWei/BigDataView
100+套大数据可视化炫酷大屏Html5模板;包含行业:社区、物业、政务、交通、金融银行等,全网最新、最多,最全、最酷、最炫大数据可视化模板。陆续更新中
Language: JavaScript - Size: 825 MB - Last synced at: 17 days ago - Pushed at: 19 days ago - Stars: 4,054 - Forks: 1,176

unum-cloud/ustore
Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️
Language: C++ - Size: 6.56 MB - Last synced at: 15 days ago - Pushed at: almost 2 years ago - Stars: 600 - Forks: 34

oldBuho/Python
Backup and testing
Language: Jupyter Notebook - Size: 3.48 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0

jamesmudd/jhdf
A pure Java HDF5 library
Language: Java - Size: 4.84 MB - Last synced at: 16 days ago - Pushed at: 18 days ago - Stars: 154 - Forks: 41

DTStack/flinkStreamSQL
基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Language: Java - Size: 6.75 MB - Last synced at: 17 days ago - Pushed at: over 1 year ago - Stars: 2,047 - Forks: 927

YoongiKim/AutoCrawler
Google, Naver multiprocess image web crawler (Selenium)
Language: Python - Size: 168 MB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 1,663 - Forks: 423

jadianes/spark-py-notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Language: Jupyter Notebook - Size: 2.2 MB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 1,652 - Forks: 917

leesf/hudi-resources
汇总Apache Hudi相关资料
Size: 23.7 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 552 - Forks: 160

zhaoyachao/zdh_web
大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批流,私域营销等模块
Language: Java - Size: 141 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 510 - Forks: 179

binghe001/binghe001.github.io
📚 本静态博客是作者冰河多年从事多年互联网大厂开发、架构的学习历程技术汇总,旨在为大家提供一个清晰详细的学习教程,侧重点更倾向编写Java核心内容、底层原理、架构知识、渗透技术。如果本仓库能为您提供帮助,请给予支持(关注、点赞、分享)!
Language: HTML - Size: 1.82 GB - Last synced at: 20 days ago - Pushed at: 21 days ago - Stars: 34 - Forks: 3
