An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: bigdata

andrewsuadnya/YouTube-Live-Chat-Sentiment-Analysis

📊 Near Real-time YouTube Live Chat Sentiment Analysis using Big Data Stack — Built with Kafka, Spark Structured Streaming, Elasticsearch, Kibana, Flask, and React.js to collect, process, and analyze live chat.

Language: Python - Size: 23 MB - Last synced at: about 8 hours ago - Pushed at: about 8 hours ago - Stars: 2 - Forks: 0

aliyun/aliyun-odps-java-sdk

ODPS SDK for Java Developers

Language: Java - Size: 29.3 MB - Last synced at: about 8 hours ago - Pushed at: about 9 hours ago - Stars: 90 - Forks: 50

basedt/dms

open-source, free, and AI-powered intelligent data management system,supports AI and compatible with multiple databases including MySQL, Oracle, PostgreSQL, Doris, etc.

Language: Java - Size: 869 KB - Last synced at: about 12 hours ago - Pushed at: about 12 hours ago - Stars: 31 - Forks: 8

ramarimoo/insert-tools

Simple and fast Python toolset for bulk data insertion into databases and CSVs. Ideal for ETL pipelines and data engineering tasks.

Language: Python - Size: 31.3 KB - Last synced at: about 12 hours ago - Pushed at: about 13 hours ago - Stars: 0 - Forks: 0

adamalton/django-mass-migration

Django app for managing long-running data operations on large and/or schemaless databases

Language: Python - Size: 131 KB - Last synced at: about 23 hours ago - Pushed at: about 24 hours ago - Stars: 4 - Forks: 2

jfsanchez/bigdata

Language: Jupyter Notebook - Size: 30.1 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

ictchenbo/SmartETL

SmartETL:一个简单、灵活、可配置、开箱即用的Python ETL框架,具有领域特色,拒绝重复造轮子!提供Wikidata / Wikipedia / GDELT等多种开源数据的处理流程; 支持txt/json/csv/excel等文件格式、MySQL/PostgreSQL/MongoDB/ClickHouse/ElasticSearch等数据库作为输入和输出; 提供大模型、Web API等多种处理算子

Language: Python - Size: 4.74 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 17 - Forks: 3

125ade/AIS_Data_Analysis

Project on AIS data analysis to extract valuable maritime insights and improve vessel monitoring and navigation

Language: Python - Size: 271 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

100-rab/AMO

[RSS 2025] AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control

Language: Python - Size: 44.5 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

juicedata/juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

Language: Go - Size: 62.2 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 11,702 - Forks: 1,038

open-metadata/openmetadata-site

Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

Language: TypeScript - Size: 54.6 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 14 - Forks: 11

k0c0r/improved-journey

Derek Simmons - Strategic Builder | Innovation Architect

Size: 1000 Bytes - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

Yimyaa/AI-ML-Cheatsheets

All standford Cheatsheets: Artificial Intelligence, Transformers, LLMs, Deep Learning, Machine Learning, Probabilities, Statistics, Algebra and Calculus.

Size: 50.3 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2 - Forks: 0

GoogleCloudPlatform/data-analytics-golden-demo

An end to end demo of Google's Cloud data and analytic stack.

Language: Jupyter Notebook - Size: 11.7 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 253 - Forks: 84

apache/celeborn

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.

Language: Java - Size: 31.3 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 953 - Forks: 392

AparajithKrishna/Mental-health-support

A simple and beginner-friendly web app built using HTML, CSS, and JavaScript to promote mental well-being.

Language: HTML - Size: 3.91 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

timebusker/timebusker.github.io

timebusker.github.io

Language: HTML - Size: 235 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

NationalSecurityAgency/datawave

DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.

Language: Java - Size: 111 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 615 - Forks: 259

microsoft/Mobius

C# and F# language binding and extensions to Apache Spark

Language: C# - Size: 6.44 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 939 - Forks: 211

reductstore/reductstore

High Performance Storage and Streaming Solution for Data Acquisition Systems

Language: Rust - Size: 2.63 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 213 - Forks: 13

databendlabs/databend

𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com

Language: Rust - Size: 295 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 8,473 - Forks: 779

apache/hudi

Upserts, Deletes And Incremental Processing on Big Data.

Language: Java - Size: 1.82 GB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 5,833 - Forks: 2,411

apparebit/shantay

Trying to make sense of the EU's DSA Transparency DB

Language: HTML - Size: 5.78 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

jotstolu/Azure-Data-Engineering-End-to-End-Project---NYC-taxi-dataset

An end‑to‑end data engineering pipeline for NYC Green Taxi trip records, built on Microsoft Azure. This project ingests Jan–Dec 2024 Parquet files from the NYC Taxi API into a Bronze Delta Lake layer, cleans and enriches the data in a Silver layer with PySpark on Azure Databricks, then saves the transformed data to the Gold layer in delta format

Size: 881 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

mfuu/vue3-virtual-sortable

A virtual scrolling list component that can be sorted by dragging, for vue3

Language: TypeScript - Size: 1.89 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 46 - Forks: 11

DragonKingpin/Hydra

Hydra九头龙,面向PB级别知识库取数、情报系统、数据平台、大规模控制调度系统。建设云计算资源管理、任务/服务统一调度、数仓、微服务化、中台基建系统化能力。——以实现大规模分布式爬虫搜索引擎为例。

Language: Java - Size: 20 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 299 - Forks: 21

ganweisoft/TOMs

TOMs is a fully open-source, systematic, plugin-based, high-performance, out-of-the-box, and production-ready development framework for IoT industry applications.

Size: 34.2 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

apache/shardingsphere

Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.

Language: Java - Size: 634 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 20,284 - Forks: 6,818

taosdata/TDengine

High-performance, scalable time-series database designed for Industrial IoT (IIoT) scenarios

Language: C - Size: 631 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 23,976 - Forks: 4,920

transferia/transferia

Open Source Cloud Native Ingestion engine

Language: Go - Size: 22.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 122 - Forks: 14

Correia-jpv/fucking-awesome-bigdata

A curated list of awesome big data frameworks, resources and other awesomeness. With repository stars⭐ and forks🍴

Size: 655 KB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 10 - Forks: 1

Srihariharasudhan-Balakannan/Trends-in-Data-jobs

The Trends in Data Jobs project is a web scraping and data visualization tool designed to track and analyze trends in data-related job postings.

Language: Jupyter Notebook - Size: 40.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 1

Netflix/genie

Distributed Big Data Orchestration Service

Language: Java - Size: 206 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,734 - Forks: 372

viniciusvdias/pdm

DCC/UFLA course "Big-Data: Massive Data Processing"

Language: Jupyter Notebook - Size: 209 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 3

oxnr/awesome-bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

Size: 843 KB - Last synced at: 4 days ago - Pushed at: 4 months ago - Stars: 13,645 - Forks: 2,572

delhoume/BigMars

Make your own terapixel interactive image of the surface of Mars

Language: C - Size: 3.37 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2 - Forks: 0

sderosiaux/every-single-day-i-tldr

A daily digest of the articles or videos I've found interesting, that I want to share with you.

Size: 6.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 320 - Forks: 21

apache/amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.

Language: Java - Size: 67.9 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 978 - Forks: 335

xuf-95/next-blog Fork of CaliCastle/cali.so

my person website

Language: TypeScript - Size: 29.3 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4 - Forks: 0

apconw/sanic-web

一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen2.5等大模型 基于 Dify 、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目,采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答,具备处理 CSV 文件 📂 表格问答的能力。同时,能方便对接第三方开源 RAG 系统 检索系统 🌐等,以支持广泛的通用知识问答。

Language: JavaScript - Size: 145 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 739 - Forks: 139

xavigs/breaks-analyzer

Web scraper that extracts all daily tennis matches, and analyse them to predict the probability in the "First Set Player To Break Serve" market.

Language: Python - Size: 21 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

zyra121/advertising-sales-prediction

This repository showcases a linear regression analysis using the Advertising dataset, demonstrating both simple and multiple regression techniques in Python. It also features a custom implementation of Gradient Descent for a deeper understanding of the concepts. 🐱💻📊

Language: Python - Size: 1.33 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

apache/airavata

A general purpose Distributed Systems Framework

Language: Java - Size: 167 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 120 - Forks: 132

HariSekhon/Knowledge-Base

Large Tech Knowledge Base from 20 years in DevOps, Linux, Cloud, Big Data, AWS, GCP etc - gradually porting my large private knowledge base to public

Language: Shell - Size: 183 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 174 - Forks: 31

zeromicro/cds

Data syncing in golang for ClickHouse.

Language: Go - Size: 7.01 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 970 - Forks: 140

mfuu/vue-virtual-sortable

A virtual scrolling list component that can be sorted by dragging

Language: TypeScript - Size: 3.19 MB - Last synced at: 2 days ago - Pushed at: 11 days ago - Stars: 42 - Forks: 12

Azure/azure-event-hubs-spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Language: Scala - Size: 19.6 MB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 236 - Forks: 178

apache/avro

Apache Avro is a data serialization system.

Language: Java - Size: 74.5 MB - Last synced at: 6 days ago - Pushed at: 11 days ago - Stars: 3,082 - Forks: 1,675

arvados/arvados

An open source platform for managing and analyzing biomedical big data

Language: Go - Size: 75.5 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 403 - Forks: 122

apache/avro-rs

Rust SDK for Apache Avro - a data serialization system.

Language: Rust - Size: 1.5 MB - Last synced at: 6 days ago - Pushed at: 8 days ago - Stars: 62 - Forks: 27

devinrsmith/deephaven-parquet-viewer

A browser-based Parquet file viewer

Language: Shell - Size: 194 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 46 - Forks: 3

moderatedan/DataBrokerOptOut

Data Broker Opt Out is a Python script that provides a convenient way to access opt-out pages of various data brokers on the web. Data brokers are companies that collect, analyze, and sell personal information, and opting out from their services can enhance your privacy.

Size: 167 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 2 - Forks: 0

MoRan1607/BigDataGuide

大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料

Size: 154 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 2,958 - Forks: 903

DTStack/monaco-sql-languages

SQL languages for monaco-editor

Language: TypeScript - Size: 72.2 MB - Last synced at: 4 days ago - Pushed at: 9 days ago - Stars: 263 - Forks: 49

gilberto-009199/bigdata

Workspaces de BigData:

Language: Java - Size: 60.4 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

lovnishverma/Slidespptspdfs

Pdfs For Learning Python, DBMS, Big Data and Data Science AIML and much more...

Size: 51.5 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 9 - Forks: 0

AbsaOSS/spline

Data Lineage Tracking And Visualization Solution

Language: Scala - Size: 8.38 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 627 - Forks: 159

visualpython/visualpython

GUI-based Python code generator for data science, extension to Jupyter Lab, Jupyter Notebook and Google Colab.

Language: JavaScript - Size: 57.2 MB - Last synced at: 5 days ago - Pushed at: 12 months ago - Stars: 895 - Forks: 118

martymac/fpart

Sort files and pack them into partitions

Language: C - Size: 1.31 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 262 - Forks: 43

DTStack/dt-sql-parser

SQL Parsers for BigData, built with antlr4.

Language: TypeScript - Size: 51 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 343 - Forks: 101

MrXujiang/v6.dooring.public

可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.

Language: TypeScript - Size: 36 MB - Last synced at: 2 days ago - Pushed at: 6 months ago - Stars: 659 - Forks: 152

raystack/meteor

Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog.

Language: Go - Size: 14.5 MB - Last synced at: about 13 hours ago - Pushed at: 8 months ago - Stars: 208 - Forks: 42

canimus/cuallee

Possibly the fastest DataFrame-agnostic quality check library in town.

Language: Python - Size: 2.29 MB - Last synced at: 10 days ago - Pushed at: 12 days ago - Stars: 191 - Forks: 20

CyprienKelma/Projet-M1

Entreprise-grade, scalable and resilient architecture for data management and processing.

Language: Jupyter Notebook - Size: 43.9 MB - Last synced at: 3 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

legend-exp/legend-pydataobj

LEGEND Python Data Objects

Language: Python - Size: 1.22 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 11

rustfs/rustfs

🚀 High-performance distributed object storage for MinIO alternative.

Size: 1.64 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 104 - Forks: 5

hi-primus/optimus

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Language: Python - Size: 110 MB - Last synced at: 2 days ago - Pushed at: 6 months ago - Stars: 1,512 - Forks: 232

xiongshengxiao/CloudEon

CloudEon 使用 Kubernetes 安装和部署开源大数据组件,使开源大数据平台的容器化运行成为可能。这允许您减少对底层资源管理和维护的关注。

Language: FreeMarker - Size: 80.6 MB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

volcano-sh/volcano

A Cloud Native Batch System (Project under CNCF)

Language: Go - Size: 84 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 4,705 - Forks: 1,087

alercebroker/ztf_explorer

🌚 🔭 💻 ZTF Explorer for the ALeRCE broker

Language: Vue - Size: 31.6 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 6 - Forks: 0

mfuu/ngx-virtual-sortable

A virtual scrolling list component that can be sorted by dragging

Language: TypeScript - Size: 526 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 2 - Forks: 0

mfuu/react-virtual-sortable

A virtual scrolling list component that can be sorted by dragging

Language: TypeScript - Size: 1.87 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 5 - Forks: 0

divithraju/divith-aju-Hadoop-Pyspark-pipeline

This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.

Language: Python - Size: 4.88 KB - Last synced at: 10 days ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

mjakubowski84/parquet4s

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

Language: Scala - Size: 2.33 MB - Last synced at: 7 days ago - Pushed at: 20 days ago - Stars: 291 - Forks: 67

scikit-hep/uproot5

ROOT I/O in pure Python and NumPy.

Language: Python - Size: 4.24 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 249 - Forks: 84

gearpump/gearpump

Lightweight real-time big data streaming engine over Akka

Language: Scala - Size: 26.2 MB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 761 - Forks: 152

apache/mnemonic 📦

Apache Mnemonic - A non-volatile hybrid memory storage oriented library

Language: Java - Size: 3.09 MB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 118 - Forks: 63

DataExpert-io/data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

Language: Jupyter Notebook - Size: 55.6 MB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 28,430 - Forks: 5,750

vaexio/vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

Language: Python - Size: 133 MB - Last synced at: 11 days ago - Pushed at: 8 months ago - Stars: 8,387 - Forks: 598

dotnet/spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Language: C# - Size: 4.87 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 2,058 - Forks: 327

jfsanchez/jfsanchez.github.io

Web con soporte para resto de proyectos

Language: JavaScript - Size: 7.6 MB - Last synced at: 13 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

apache/incubator-livy

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

Language: Scala - Size: 3.54 MB - Last synced at: 6 days ago - Pushed at: 25 days ago - Stars: 913 - Forks: 610

jnidzwetzki/bboxdb

BBoxDB is a scalable, highly available, and distributed data store for multi-dimensional big data. The software supports operations like multi-dimensional range queries and spatial joins. In addition, data streams are supported.

Language: Java - Size: 32.8 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 55 - Forks: 9

kubernetes-retired/kube-batch 📦

A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC

Language: Go - Size: 44.1 MB - Last synced at: 1 day ago - Pushed at: about 2 years ago - Stars: 1,090 - Forks: 264

atlas555/atlas555.github.io

a personal blog

Language: HTML - Size: 28.6 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

dromara/dataCompare

big data comparison and data profiling platform: low code,data comparison and data profiling

Language: Java - Size: 10.9 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 265 - Forks: 62

rdkmaster/jigsaw

Jigsaw七巧板 provides a set of web components based on Angular5/8/9+. The main purpose of Jigsaw is to help the application developers to construct complex & intensive interacting & user friendly web pages. Jigsaw is supporting the development of all applications of Big Data Product of ZTE.

Language: HTML - Size: 72 MB - Last synced at: 10 days ago - Pushed at: about 2 months ago - Stars: 486 - Forks: 72

benedekh/bigdata-projects

Student projects in Big Data field.

Language: Java - Size: 198 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 19 - Forks: 12

keanteng/wqd7009

🗃️ Data & Working Files for Big Data Pipeline on Google Cloud

Language: Jupyter Notebook - Size: 12.8 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

atengk/ops

运维相关的技术仓库

Language: Shell - Size: 21.7 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 3 - Forks: 3

iGaoWei/BigDataView

100+套大数据可视化炫酷大屏Html5模板;包含行业:社区、物业、政务、交通、金融银行等,全网最新、最多,最全、最酷、最炫大数据可视化模板。陆续更新中

Language: JavaScript - Size: 825 MB - Last synced at: 17 days ago - Pushed at: 19 days ago - Stars: 4,054 - Forks: 1,176

unum-cloud/ustore

Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️

Language: C++ - Size: 6.56 MB - Last synced at: 15 days ago - Pushed at: almost 2 years ago - Stars: 600 - Forks: 34

oldBuho/Python

Backup and testing

Language: Jupyter Notebook - Size: 3.48 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0

jamesmudd/jhdf

A pure Java HDF5 library

Language: Java - Size: 4.84 MB - Last synced at: 16 days ago - Pushed at: 18 days ago - Stars: 154 - Forks: 41

DTStack/flinkStreamSQL

基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法

Language: Java - Size: 6.75 MB - Last synced at: 17 days ago - Pushed at: over 1 year ago - Stars: 2,047 - Forks: 927

YoongiKim/AutoCrawler

Google, Naver multiprocess image web crawler (Selenium)

Language: Python - Size: 168 MB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 1,663 - Forks: 423

jadianes/spark-py-notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Language: Jupyter Notebook - Size: 2.2 MB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 1,652 - Forks: 917

leesf/hudi-resources

汇总Apache Hudi相关资料

Size: 23.7 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 552 - Forks: 160

zhaoyachao/zdh_web

大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批流,私域营销等模块

Language: Java - Size: 141 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 510 - Forks: 179

binghe001/binghe001.github.io

📚 本静态博客是作者冰河多年从事多年互联网大厂开发、架构的学习历程技术汇总,旨在为大家提供一个清晰详细的学习教程,侧重点更倾向编写Java核心内容、底层原理、架构知识、渗透技术。如果本仓库能为您提供帮助,请给予支持(关注、点赞、分享)!

Language: HTML - Size: 1.82 GB - Last synced at: 20 days ago - Pushed at: 21 days ago - Stars: 34 - Forks: 3