Topic: "hdfs"
seaweedfs/seaweedfs
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
Language: Go - Size: 68.5 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 24,268 - Forks: 2,389

heibaiying/BigData-Notes
大数据入门指南 :star:
Language: Java - Size: 22.9 MB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 16,368 - Forks: 4,277

ceph/ceph
Ceph is a distributed object, block, and file storage platform
Language: C++ - Size: 794 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 14,965 - Forks: 6,053

juicedata/juicefs
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
Language: Go - Size: 62 MB - Last synced at: 6 days ago - Pushed at: 13 days ago - Stars: 11,547 - Forks: 1,022

wangzhiwubigdata/God-Of-BigData
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Size: 66.3 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 10,043 - Forks: 3,201

piskvorky/smart_open
Utils for streaming large files (S3, HDFS, gzip, bz2...)
Language: Python - Size: 1.55 MB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 3,306 - Forks: 383

TileDB-Inc/TileDB
The Universal Storage Engine
Language: C++ - Size: 101 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,930 - Forks: 192

water8394/BigData-Interview
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Size: 6.59 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 1,610 - Forks: 446

collabH/bigdata-growth
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Language: Shell - Size: 221 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 1,579 - Forks: 377

colinmarc/hdfs
A native go client for HDFS
Language: Go - Size: 2.11 MB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 1,388 - Forks: 350

wgzhao/Addax
A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL seamlessly
Language: Java - Size: 44.4 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1,262 - Forks: 309

spotify/snakebite 📦
A pure python HDFS client
Language: Python - Size: 4.55 MB - Last synced at: 8 days ago - Pushed at: about 3 years ago - Stars: 857 - Forks: 217

HariSekhon/DevOps-Python-tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Language: Python - Size: 3.11 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 794 - Forks: 347

sunnyandgood/BigData
💎🔥大数据学习笔记
Language: Java - Size: 316 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 647 - Forks: 222

Stratio/sparta
Real Time Analytics and Data Pipelines based on Spark Streaming
Language: Scala - Size: 123 MB - Last synced at: 3 days ago - Pushed at: over 5 years ago - Stars: 526 - Forks: 196

lensesio/kafka-connect-ui
Web tool for Kafka Connect |
Language: JavaScript - Size: 1.17 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 510 - Forks: 132

dromara/CloudEon
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.
Language: FreeMarker - Size: 55.4 MB - Last synced at: 27 days ago - Pushed at: about 2 months ago - Stars: 455 - Forks: 111

fabiogjardim/bigdata_docker
Big Data Ecosystem Docker
Language: VBA - Size: 126 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 407 - Forks: 319

uber/storagetapper
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Language: Go - Size: 794 KB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 351 - Forks: 66

tirthajyoti/Spark-with-Python
Fundamentals of Spark with Python (using PySpark), code examples
Language: Jupyter Notebook - Size: 8.97 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 343 - Forks: 272

datawhalechina/juicy-bigdata
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Language: Python - Size: 27.4 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 312 - Forks: 43

Eugene-Mark/bigdata-file-viewer
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Language: Java - Size: 38.8 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 291 - Forks: 54

wradlib/wradlib
weather radar data processing - python package
Language: Python - Size: 45.1 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 282 - Forks: 84

divolte/divolte-collector 📦
Divolte Collector
Language: Java - Size: 8.98 MB - Last synced at: 29 days ago - Pushed at: over 3 years ago - Stars: 281 - Forks: 76

mtth/hdfs
API and command line interface for HDFS
Language: Python - Size: 611 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 272 - Forks: 103

RumbleDB/rumble
⛈️ RumbleDB 1.23.0 "Mountain Ash" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Language: Java - Size: 376 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 223 - Forks: 83

breuner/elbencho
A distributed storage benchmark for file systems, object stores & block devices with support for GPUs
Language: C++ - Size: 1.35 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 201 - Forks: 26

TileDB-Inc/TileDB-Py
Python interface to the TileDB storage engine
Language: Python - Size: 5.29 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 193 - Forks: 35

helyim/helyim
seaweedfs implemented in pure Rust
Language: Rust - Size: 377 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 183 - Forks: 20

PaddlePaddle/ElasticCTR
ElasticCTR,即飞桨弹性计算推荐系统,是基于Kubernetes的企业级推荐系统开源解决方案。该方案融合了百度业务场景下持续打磨的高精度CTR模型、飞桨开源框架的大规模分布式训练能力、工业级稀疏参数弹性调度服务,帮助用户在Kubernetes环境中一键完成推荐系统部署,具备高性能、工业级部署、端到端体验的特点,并且作为开源套件,满足二次深度开发的需求。
Language: Python - Size: 1.36 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 182 - Forks: 44

d2iq-archive/dcos-commons 📦
DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.
Language: Java - Size: 103 MB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 156 - Forks: 170

marcelmay/hadoop-hdfs-fsimage-exporter
Exports Hadoop HDFS content statistics to Prometheus
Language: Java - Size: 537 KB - Last synced at: about 15 hours ago - Pushed at: about 16 hours ago - Stars: 154 - Forks: 47

avast/hdfs-shell
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Language: Java - Size: 384 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 151 - Forks: 33

jcrist/skein
A tool and library for easily deploying applications on Apache YARN
Language: Python - Size: 5.94 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 143 - Forks: 39

megvii-research/megfile
Megvii FILE Library - Working with Files in Python same as the standard library
Language: Python - Size: 15.6 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 139 - Forks: 18

mullerhai/HsunTzu
HDFS compress tar zip snappy gzip uncompress untar codec hadoop spark
Language: Scala - Size: 74.2 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 136 - Forks: 38

linkedin/dynamometer
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Language: Java - Size: 297 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 129 - Forks: 36

paypal/NNAnalytics
NameNodeAnalytics is a self-help utility for scouting and maintaining the namespace of an HDFS instance.
Language: Java - Size: 2.64 MB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 116 - Forks: 72

mmolimar/kafka-connect-fs
Kafka Connect FileSystem Connector
Language: Java - Size: 524 KB - Last synced at: about 7 hours ago - Pushed at: over 2 years ago - Stars: 111 - Forks: 77

snowlift/trino-storage
Storage connector for Trino
Language: Java - Size: 2.68 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 110 - Forks: 36

TileDB-Inc/TileDB-R
R interface to TileDB: The Modern Database
Language: R - Size: 14.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 109 - Forks: 21

starlake-ai/starlake
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Language: Scala - Size: 170 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 107 - Forks: 23

gglinux/wifi
基于wifi抓取信息的大数据查询分析系统
Language: Java - Size: 113 MB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 105 - Forks: 64

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka
Data Engineering Project with Hadoop HDFS and Kafka
Language: Python - Size: 3.46 MB - Last synced at: 26 days ago - Pushed at: over 1 year ago - Stars: 102 - Forks: 25

LuckyZXL2016/Cloud-Note
基于分布式的云笔记(参考某道云笔记),数据存储在redis与hbase中
Language: Java - Size: 3.23 MB - Last synced at: about 1 month ago - Pushed at: over 7 years ago - Stars: 98 - Forks: 44

autovia/ros_hadoop
Hadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.
Language: Scala - Size: 34.1 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 97 - Forks: 42

HariSekhon/DevOps-Perl-tools
25+ DevOps CLI Tools - Anonymizer, SQL ReCaser (MySQL, PostgreSQL, AWS Redshift, Snowflake, Apache Drill, Hive, Impala, Cassandra CQL, Microsoft SQL Server, Oracle, Couchbase N1QL, Dockerfiles), Hadoop HDFS & Hive tools, Solr/SolrCloud CLI, Nginx stats & HTTP(S) URL watchers for load-balanced web farms, Linux tools etc.
Language: Perl - Size: 2.13 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 94 - Forks: 43

jingw/pyhdfs
Python HDFS client
Language: Python - Size: 118 KB - Last synced at: about 11 hours ago - Pushed at: 2 months ago - Stars: 94 - Forks: 23

maxis42/Big-Data-Engineering-Coursera-Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Language: Jupyter Notebook - Size: 66.2 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 92 - Forks: 74

TencentBlueKing/blueking-dbm
DBM,数据库管理,集成了MySQL、Redis、ES、Kafka、HDFS、InfluxDB、Pulsar等多种数据库组件的全生命周期管理,提供了海量集群的批量管理能力,以及相应DB组件的集群管理工具箱,并配套DB个性化配置、高可用切换、域名管理等DB个性化服务,同时全方位的监控告警可观测能力,让数据库管理员、运维、开发等用户可以轻松完成数据库管理工作,更高效、更安全、更全面的管理数据库。 The database management platform integrates a variety of database components such as MySQL...
Language: Python - Size: 68.9 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 89 - Forks: 61

seznam/euphoria
Euphoria is an open source Java API for creating unified big-data processing flows. It provides an engine independent programming model which can express both batch and stream transformations.
Language: Java - Size: 3.9 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 81 - Forks: 11

greenplum-db/pxf
Platform Extension Framework: Federated Query Engine
Language: Java - Size: 27.6 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 79 - Forks: 60

dbiir/rainbow
A data layout optimization framework for wide tables stored on HDFS. See rainbow's webpage
Language: Java - Size: 65.2 MB - Last synced at: about 1 month ago - Pushed at: almost 7 years ago - Stars: 72 - Forks: 36

longshilin/HDFS-Netdisc
基于Hadoop的分布式云存储系统 :palm_tree:
Language: Java - Size: 3.93 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 71 - Forks: 20

geodocker/geodocker
Central repository for the GeoDocker project
Size: 9.77 KB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 66 - Forks: 16

ait-aecid/anomaly-detection-log-datasets
Analysis scripts for log data sets used in anomaly detection.
Language: Python - Size: 108 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 63 - Forks: 7

fluent/fluent-plugin-webhdfs
Hadoop WebHDFS output plugin for Fluentd
Language: Ruby - Size: 189 KB - Last synced at: about 15 hours ago - Pushed at: 3 months ago - Stars: 60 - Forks: 37

monix/monix-connect
A set of connectors for Monix. 🔛
Language: Scala - Size: 8.56 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 60 - Forks: 17

marcelmay/hfsa
Hadoop FSImage Analyzer (HFSA)
Language: Java - Size: 3.41 MB - Last synced at: about 15 hours ago - Pushed at: about 16 hours ago - Stars: 59 - Forks: 24

damiencarol/jsr203-hadoop
A Java NIO file system provider for HDFS
Language: Java - Size: 28.1 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 57 - Forks: 38

ascrus/getl
A tool for developing and testing ETL and ELT processes for automating the capture, delivery and processing of information in data warehouses on the MicroFocus Vertica platform.
Language: Groovy - Size: 232 MB - Last synced at: 27 days ago - Pushed at: over 1 year ago - Stars: 57 - Forks: 10

flokkr/docker-hadoop
Docker image for main Apache Hadoop components (Yarn/Hdfs)
Language: Shell - Size: 12.9 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 57 - Forks: 24

lackhurt/flume-canal-source
Flume NG Canal source
Language: Java - Size: 39.1 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 57 - Forks: 29

zdkzdk/aaocp
一个对用户行为日志进行分析的大数据项目
Language: PLpgSQL - Size: 74.8 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 56 - Forks: 20

TileDB-Inc/TileDB-Go
Go Interface to the TileDB storage manager
Language: Go - Size: 1.41 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 53 - Forks: 9

microsoft/flink-on-azure
Examples of Flink on Azure
Language: Java - Size: 921 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 53 - Forks: 8

liumingmusic/HadoopLearning
全套大数据基础学习教程,包含最基础的centos、maven。大数据主要包含hdfs、mr、yarn、hbase、kafka、scala、sparkcore、sparkstreaming、sparksql。教程包含所有的源代码演示以及在线文档说明。
Language: Scala - Size: 5.95 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 52 - Forks: 24

terascope/teraslice
Scalable data processing pipelines in JavaScript
Language: TypeScript - Size: 111 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 51 - Forks: 13

aikuyun/bigdata-doc
大数据学习笔记,学习路线,技术案例整理。
Language: Shell - Size: 2.38 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 47 - Forks: 19

dunwu/bigdata-tutorial
Language: Java - Size: 8.81 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 46 - Forks: 16

criteo/cluster-pack
A library on top of either pex or conda-pack to make your Python code easily available on a cluster
Language: Python - Size: 436 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 45 - Forks: 21

kuda-io/kuda
Kubernetes 原生的数据交付平台
Language: Go - Size: 7.34 MB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 45 - Forks: 13

dogukannulu/streaming_data_processing
Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO
Language: Python - Size: 1.81 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 44 - Forks: 17

Wittline/apache-spark-docker
Dockerizing an Apache Spark Standalone Cluster
Language: VBA - Size: 63.7 MB - Last synced at: 29 days ago - Pushed at: almost 3 years ago - Stars: 43 - Forks: 27

mrugankray/Big-Data-Cluster
The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Postgres, Cassandra, Hue, Zeppelin, Kadmin, Kafka Control Center and pgAdmin. This cluster is solely intended for usage in a development environment. Do not use it to run any production workloads.
Language: Shell - Size: 118 KB - Last synced at: 12 months ago - Pushed at: about 2 years ago - Stars: 41 - Forks: 15

IBMStreams/samples
This repository contains open-source sample applications for IBM Streams.
Language: Java - Size: 251 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 41 - Forks: 73

DarkPhoenixs/hbase-meta-repair
Repair hbase metadata table from hdfs.
Language: Java - Size: 17.6 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 39 - Forks: 31

canelmas/kafka-connect-field-and-time-partitioner
Kafka Connect Store Partitioner by custom fields and time
Language: Java - Size: 24.4 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 39 - Forks: 29

kailanyue/SZ-Metro
深圳地铁大数据客流分析系统
Language: Java - Size: 20.6 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 39 - Forks: 11

adform/stream-loader
Components for building stream loaders from Kafka to arbitrary storages
Language: Scala - Size: 3.03 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 37 - Forks: 8

jacobstanley/hadoop-tools
Tools for working with Hadoop, written with performance in mind.
Language: Haskell - Size: 265 KB - Last synced at: 5 days ago - Pushed at: over 7 years ago - Stars: 37 - Forks: 15

winstonelei/BigDataTools
tools for bigData
Language: Java - Size: 235 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 36 - Forks: 24

ZongXR/BigData-Competition
全国大数据竞赛三等奖解决方案,省赛二等奖解决方案。一键安装大数据环境脚本,自动部署集群环境,包括zookeeper、hadoop、mysql、hive、spark以及一些基础环境。已通过实际服务器测试,效果极佳,仅需要输入密码等少量人为干预。解放安装部署配置所需人力。并添加若干scala案例,结合spark用以进行数据准备。
Language: Scala - Size: 9.03 MB - Last synced at: 28 days ago - Pushed at: 8 months ago - Stars: 35 - Forks: 11

mincloud1501/BigData
빅데이터 pipeline 구성 요소 기술들에 관한 coding 실습 및 연구
Language: Java - Size: 26.4 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 33 - Forks: 13

maniram-yadav/Big_DataHadoop_Projects
Big data projects implemented by Maniram yadav
Language: PigLatin - Size: 2.79 MB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 33 - Forks: 33

agile-lab-dev/wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Language: Scala - Size: 7.66 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 31 - Forks: 12

oracle/oci-hdfs-connector
HDFS Connector for Oracle Cloud Infrastructure
Language: Java - Size: 689 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 31 - Forks: 26

kmgowda/SBK
Storage Benchmark Kit
Language: Java - Size: 25.6 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 31 - Forks: 65

gchq/gaffer-docker
Gaffer Docker images and associated Helm charts for deploying on Kubernetes
Language: Shell - Size: 2.82 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 31 - Forks: 37

sdadas/fsbrowser
Fast desktop client for Hadoop Distributed File System
Language: Java - Size: 288 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 31 - Forks: 16

sergio11/document_search_engine_architecture
📄🚀 Unleash a powerful Document Search Engine with Apache NiFi for lightning-fast, comprehensive text indexing and search.
Language: Java - Size: 13.5 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 30 - Forks: 11

intenthq/pucket 📦
Bucketing and partitioning system for Parquet
Language: Scala - Size: 868 KB - Last synced at: 2 months ago - Pushed at: almost 7 years ago - Stars: 30 - Forks: 2

OrangeDrk/JavaNotes
Java后端学习笔记。包括Linux、maven、git、互联网架构、大数据体系等
Size: 149 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 29 - Forks: 9

SANSA-Stack/SANSA-Notebooks
Interactive Spark Notebooks for running SANSA examples.
Language: Makefile - Size: 8.97 MB - Last synced at: 28 days ago - Pushed at: over 4 years ago - Stars: 29 - Forks: 11

wujun728/jun_bigdata
jun_bigdata大数据平台常用服务。内含大数据组件常用的功能及demo,包括hadoop、hbase、hive、kafka等等,实现了Spark SQL对Redis、MongoDB等非关系型数据库的数据的读写等等
Language: JavaScript - Size: 10.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 27 - Forks: 18

mbajer42/ucz-dfs
A distributed file system written in Rust.
Language: Rust - Size: 76.2 KB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 27 - Forks: 2

DICL/VeloxDFS
DHT-based Distributed File System for MapReduce Jobs
Language: C++ - Size: 16.3 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 26 - Forks: 4

stormsinbrewing/Real_Time_Social_Media_Mining
DevOps pipeline for Real Time Social/Web Mining
Language: HTML - Size: 30.5 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 25 - Forks: 10

BeardedManZhao/dataTear
Split into data blocks,In this format, efficient reading can be realized,Avoid unnecessary data reading operations.
Language: Java - Size: 2.07 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 25 - Forks: 1

onanypoint/yandex-big-data-engineering 📦
Language: Jupyter Notebook - Size: 458 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 25 - Forks: 39
