An open API service providing repository metadata for many open source software ecosystems.

Topic: "hdfs"

seaweedfs/seaweedfs

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

Language: Go - Size: 68.5 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 24,268 - Forks: 2,389

heibaiying/BigData-Notes

大数据入门指南 :star:

Language: Java - Size: 22.9 MB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 16,368 - Forks: 4,277

ceph/ceph

Ceph is a distributed object, block, and file storage platform

Language: C++ - Size: 794 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 14,965 - Forks: 6,053

juicedata/juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

Language: Go - Size: 62 MB - Last synced at: 6 days ago - Pushed at: 13 days ago - Stars: 11,547 - Forks: 1,022

wangzhiwubigdata/God-Of-BigData

专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Size: 66.3 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 10,043 - Forks: 3,201

piskvorky/smart_open

Utils for streaming large files (S3, HDFS, gzip, bz2...)

Language: Python - Size: 1.55 MB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 3,306 - Forks: 383

TileDB-Inc/TileDB

The Universal Storage Engine

Language: C++ - Size: 101 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,930 - Forks: 192

water8394/BigData-Interview

:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Size: 6.59 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 1,610 - Forks: 446

collabH/bigdata-growth

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

Language: Shell - Size: 221 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 1,579 - Forks: 377

colinmarc/hdfs

A native go client for HDFS

Language: Go - Size: 2.11 MB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 1,388 - Forks: 350

wgzhao/Addax

A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL seamlessly

Language: Java - Size: 44.4 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1,262 - Forks: 309

spotify/snakebite 📦

A pure python HDFS client

Language: Python - Size: 4.55 MB - Last synced at: 8 days ago - Pushed at: about 3 years ago - Stars: 857 - Forks: 217

HariSekhon/DevOps-Python-tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Language: Python - Size: 3.11 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 794 - Forks: 347

sunnyandgood/BigData

💎🔥大数据学习笔记

Language: Java - Size: 316 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 647 - Forks: 222

Stratio/sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

Language: Scala - Size: 123 MB - Last synced at: 3 days ago - Pushed at: over 5 years ago - Stars: 526 - Forks: 196

lensesio/kafka-connect-ui

Web tool for Kafka Connect |

Language: JavaScript - Size: 1.17 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 510 - Forks: 132

dromara/CloudEon

CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.

Language: FreeMarker - Size: 55.4 MB - Last synced at: 27 days ago - Pushed at: about 2 months ago - Stars: 455 - Forks: 111

fabiogjardim/bigdata_docker

Big Data Ecosystem Docker

Language: VBA - Size: 126 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 407 - Forks: 319

uber/storagetapper

StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service

Language: Go - Size: 794 KB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 351 - Forks: 66

tirthajyoti/Spark-with-Python

Fundamentals of Spark with Python (using PySpark), code examples

Language: Jupyter Notebook - Size: 8.97 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 343 - Forks: 272

datawhalechina/juicy-bigdata

🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

Language: Python - Size: 27.4 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 312 - Forks: 43

Eugene-Mark/bigdata-file-viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

Language: Java - Size: 38.8 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 291 - Forks: 54

wradlib/wradlib

weather radar data processing - python package

Language: Python - Size: 45.1 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 282 - Forks: 84

divolte/divolte-collector 📦

Divolte Collector

Language: Java - Size: 8.98 MB - Last synced at: 29 days ago - Pushed at: over 3 years ago - Stars: 281 - Forks: 76

mtth/hdfs

API and command line interface for HDFS

Language: Python - Size: 611 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 272 - Forks: 103

RumbleDB/rumble

⛈️ RumbleDB 1.23.0 "Mountain Ash" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Language: Java - Size: 376 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 223 - Forks: 83

breuner/elbencho

A distributed storage benchmark for file systems, object stores & block devices with support for GPUs

Language: C++ - Size: 1.35 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 201 - Forks: 26

TileDB-Inc/TileDB-Py

Python interface to the TileDB storage engine

Language: Python - Size: 5.29 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 193 - Forks: 35

helyim/helyim

seaweedfs implemented in pure Rust

Language: Rust - Size: 377 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 183 - Forks: 20

PaddlePaddle/ElasticCTR

ElasticCTR,即飞桨弹性计算推荐系统,是基于Kubernetes的企业级推荐系统开源解决方案。该方案融合了百度业务场景下持续打磨的高精度CTR模型、飞桨开源框架的大规模分布式训练能力、工业级稀疏参数弹性调度服务,帮助用户在Kubernetes环境中一键完成推荐系统部署,具备高性能、工业级部署、端到端体验的特点,并且作为开源套件,满足二次深度开发的需求。

Language: Python - Size: 1.36 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 182 - Forks: 44

d2iq-archive/dcos-commons 📦

DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.

Language: Java - Size: 103 MB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 156 - Forks: 170

marcelmay/hadoop-hdfs-fsimage-exporter

Exports Hadoop HDFS content statistics to Prometheus

Language: Java - Size: 537 KB - Last synced at: about 15 hours ago - Pushed at: about 16 hours ago - Stars: 154 - Forks: 47

avast/hdfs-shell

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

Language: Java - Size: 384 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 151 - Forks: 33

jcrist/skein

A tool and library for easily deploying applications on Apache YARN

Language: Python - Size: 5.94 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 143 - Forks: 39

megvii-research/megfile

Megvii FILE Library - Working with Files in Python same as the standard library

Language: Python - Size: 15.6 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 139 - Forks: 18

mullerhai/HsunTzu

HDFS compress tar zip snappy gzip uncompress untar codec hadoop spark

Language: Scala - Size: 74.2 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 136 - Forks: 38

linkedin/dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.

Language: Java - Size: 297 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 129 - Forks: 36

paypal/NNAnalytics

NameNodeAnalytics is a self-help utility for scouting and maintaining the namespace of an HDFS instance.

Language: Java - Size: 2.64 MB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 116 - Forks: 72

mmolimar/kafka-connect-fs

Kafka Connect FileSystem Connector

Language: Java - Size: 524 KB - Last synced at: about 7 hours ago - Pushed at: over 2 years ago - Stars: 111 - Forks: 77

snowlift/trino-storage

Storage connector for Trino

Language: Java - Size: 2.68 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 110 - Forks: 36

TileDB-Inc/TileDB-R

R interface to TileDB: The Modern Database

Language: R - Size: 14.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 109 - Forks: 21

starlake-ai/starlake

Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.

Language: Scala - Size: 170 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 107 - Forks: 23

gglinux/wifi

基于wifi抓取信息的大数据查询分析系统

Language: Java - Size: 113 MB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 105 - Forks: 64

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka

Data Engineering Project with Hadoop HDFS and Kafka

Language: Python - Size: 3.46 MB - Last synced at: 26 days ago - Pushed at: over 1 year ago - Stars: 102 - Forks: 25

LuckyZXL2016/Cloud-Note

基于分布式的云笔记(参考某道云笔记),数据存储在redis与hbase中

Language: Java - Size: 3.23 MB - Last synced at: about 1 month ago - Pushed at: over 7 years ago - Stars: 98 - Forks: 44

autovia/ros_hadoop

Hadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.

Language: Scala - Size: 34.1 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 97 - Forks: 42

HariSekhon/DevOps-Perl-tools

25+ DevOps CLI Tools - Anonymizer, SQL ReCaser (MySQL, PostgreSQL, AWS Redshift, Snowflake, Apache Drill, Hive, Impala, Cassandra CQL, Microsoft SQL Server, Oracle, Couchbase N1QL, Dockerfiles), Hadoop HDFS & Hive tools, Solr/SolrCloud CLI, Nginx stats & HTTP(S) URL watchers for load-balanced web farms, Linux tools etc.

Language: Perl - Size: 2.13 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 94 - Forks: 43

jingw/pyhdfs

Python HDFS client

Language: Python - Size: 118 KB - Last synced at: about 11 hours ago - Pushed at: 2 months ago - Stars: 94 - Forks: 23

maxis42/Big-Data-Engineering-Coursera-Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Language: Jupyter Notebook - Size: 66.2 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 92 - Forks: 74

TencentBlueKing/blueking-dbm

DBM,数据库管理,集成了MySQL、Redis、ES、Kafka、HDFS、InfluxDB、Pulsar等多种数据库组件的全生命周期管理,提供了海量集群的批量管理能力,以及相应DB组件的集群管理工具箱,并配套DB个性化配置、高可用切换、域名管理等DB个性化服务,同时全方位的监控告警可观测能力,让数据库管理员、运维、开发等用户可以轻松完成数据库管理工作,更高效、更安全、更全面的管理数据库。 The database management platform integrates a variety of database components such as MySQL...

Language: Python - Size: 68.9 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 89 - Forks: 61

seznam/euphoria

Euphoria is an open source Java API for creating unified big-data processing flows. It provides an engine independent programming model which can express both batch and stream transformations.

Language: Java - Size: 3.9 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 81 - Forks: 11

greenplum-db/pxf

Platform Extension Framework: Federated Query Engine

Language: Java - Size: 27.6 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 79 - Forks: 60

dbiir/rainbow

A data layout optimization framework for wide tables stored on HDFS. See rainbow's webpage

Language: Java - Size: 65.2 MB - Last synced at: about 1 month ago - Pushed at: almost 7 years ago - Stars: 72 - Forks: 36

longshilin/HDFS-Netdisc

基于Hadoop的分布式云存储系统 :palm_tree:

Language: Java - Size: 3.93 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 71 - Forks: 20

geodocker/geodocker

Central repository for the GeoDocker project

Size: 9.77 KB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 66 - Forks: 16

ait-aecid/anomaly-detection-log-datasets

Analysis scripts for log data sets used in anomaly detection.

Language: Python - Size: 108 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 63 - Forks: 7

fluent/fluent-plugin-webhdfs

Hadoop WebHDFS output plugin for Fluentd

Language: Ruby - Size: 189 KB - Last synced at: about 15 hours ago - Pushed at: 3 months ago - Stars: 60 - Forks: 37

monix/monix-connect

A set of connectors for Monix. 🔛

Language: Scala - Size: 8.56 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 60 - Forks: 17

marcelmay/hfsa

Hadoop FSImage Analyzer (HFSA)

Language: Java - Size: 3.41 MB - Last synced at: about 15 hours ago - Pushed at: about 16 hours ago - Stars: 59 - Forks: 24

damiencarol/jsr203-hadoop

A Java NIO file system provider for HDFS

Language: Java - Size: 28.1 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 57 - Forks: 38

ascrus/getl

A tool for developing and testing ETL and ELT processes for automating the capture, delivery and processing of information in data warehouses on the MicroFocus Vertica platform.

Language: Groovy - Size: 232 MB - Last synced at: 27 days ago - Pushed at: over 1 year ago - Stars: 57 - Forks: 10

flokkr/docker-hadoop

Docker image for main Apache Hadoop components (Yarn/Hdfs)

Language: Shell - Size: 12.9 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 57 - Forks: 24

lackhurt/flume-canal-source

Flume NG Canal source

Language: Java - Size: 39.1 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 57 - Forks: 29

zdkzdk/aaocp

一个对用户行为日志进行分析的大数据项目

Language: PLpgSQL - Size: 74.8 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 56 - Forks: 20

TileDB-Inc/TileDB-Go

Go Interface to the TileDB storage manager

Language: Go - Size: 1.41 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 53 - Forks: 9

microsoft/flink-on-azure

Examples of Flink on Azure

Language: Java - Size: 921 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 53 - Forks: 8

liumingmusic/HadoopLearning

全套大数据基础学习教程,包含最基础的centos、maven。大数据主要包含hdfs、mr、yarn、hbase、kafka、scala、sparkcore、sparkstreaming、sparksql。教程包含所有的源代码演示以及在线文档说明。

Language: Scala - Size: 5.95 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 52 - Forks: 24

terascope/teraslice

Scalable data processing pipelines in JavaScript

Language: TypeScript - Size: 111 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 51 - Forks: 13

aikuyun/bigdata-doc

大数据学习笔记,学习路线,技术案例整理。

Language: Shell - Size: 2.38 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 47 - Forks: 19

dunwu/bigdata-tutorial

Language: Java - Size: 8.81 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 46 - Forks: 16

criteo/cluster-pack

A library on top of either pex or conda-pack to make your Python code easily available on a cluster

Language: Python - Size: 436 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 45 - Forks: 21

kuda-io/kuda

Kubernetes 原生的数据交付平台

Language: Go - Size: 7.34 MB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 45 - Forks: 13

dogukannulu/streaming_data_processing

Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO

Language: Python - Size: 1.81 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 44 - Forks: 17

Wittline/apache-spark-docker

Dockerizing an Apache Spark Standalone Cluster

Language: VBA - Size: 63.7 MB - Last synced at: 29 days ago - Pushed at: almost 3 years ago - Stars: 43 - Forks: 27

mrugankray/Big-Data-Cluster

The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Postgres, Cassandra, Hue, Zeppelin, Kadmin, Kafka Control Center and pgAdmin. This cluster is solely intended for usage in a development environment. Do not use it to run any production workloads.

Language: Shell - Size: 118 KB - Last synced at: 12 months ago - Pushed at: about 2 years ago - Stars: 41 - Forks: 15

IBMStreams/samples

This repository contains open-source sample applications for IBM Streams.

Language: Java - Size: 251 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 41 - Forks: 73

DarkPhoenixs/hbase-meta-repair

Repair hbase metadata table from hdfs.

Language: Java - Size: 17.6 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 39 - Forks: 31

canelmas/kafka-connect-field-and-time-partitioner

Kafka Connect Store Partitioner by custom fields and time

Language: Java - Size: 24.4 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 39 - Forks: 29

kailanyue/SZ-Metro

深圳地铁大数据客流分析系统

Language: Java - Size: 20.6 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 39 - Forks: 11

adform/stream-loader

Components for building stream loaders from Kafka to arbitrary storages

Language: Scala - Size: 3.03 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 37 - Forks: 8

jacobstanley/hadoop-tools

Tools for working with Hadoop, written with performance in mind.

Language: Haskell - Size: 265 KB - Last synced at: 5 days ago - Pushed at: over 7 years ago - Stars: 37 - Forks: 15

winstonelei/BigDataTools

tools for bigData

Language: Java - Size: 235 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 36 - Forks: 24

ZongXR/BigData-Competition

全国大数据竞赛三等奖解决方案,省赛二等奖解决方案。一键安装大数据环境脚本,自动部署集群环境,包括zookeeper、hadoop、mysql、hive、spark以及一些基础环境。已通过实际服务器测试,效果极佳,仅需要输入密码等少量人为干预。解放安装部署配置所需人力。并添加若干scala案例,结合spark用以进行数据准备。

Language: Scala - Size: 9.03 MB - Last synced at: 28 days ago - Pushed at: 8 months ago - Stars: 35 - Forks: 11

mincloud1501/BigData

빅데이터 pipeline 구성 요소 기술들에 관한 coding 실습 및 연구

Language: Java - Size: 26.4 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 33 - Forks: 13

maniram-yadav/Big_DataHadoop_Projects

Big data projects implemented by Maniram yadav

Language: PigLatin - Size: 2.79 MB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 33 - Forks: 33

agile-lab-dev/wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Language: Scala - Size: 7.66 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 31 - Forks: 12

oracle/oci-hdfs-connector

HDFS Connector for Oracle Cloud Infrastructure

Language: Java - Size: 689 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 31 - Forks: 26

kmgowda/SBK

Storage Benchmark Kit

Language: Java - Size: 25.6 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 31 - Forks: 65

gchq/gaffer-docker

Gaffer Docker images and associated Helm charts for deploying on Kubernetes

Language: Shell - Size: 2.82 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 31 - Forks: 37

sdadas/fsbrowser

Fast desktop client for Hadoop Distributed File System

Language: Java - Size: 288 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 31 - Forks: 16

sergio11/document_search_engine_architecture

📄🚀 Unleash a powerful Document Search Engine with Apache NiFi for lightning-fast, comprehensive text indexing and search.

Language: Java - Size: 13.5 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 30 - Forks: 11

intenthq/pucket 📦

Bucketing and partitioning system for Parquet

Language: Scala - Size: 868 KB - Last synced at: 2 months ago - Pushed at: almost 7 years ago - Stars: 30 - Forks: 2

OrangeDrk/JavaNotes

Java后端学习笔记。包括Linux、maven、git、互联网架构、大数据体系等

Size: 149 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 29 - Forks: 9

SANSA-Stack/SANSA-Notebooks

Interactive Spark Notebooks for running SANSA examples.

Language: Makefile - Size: 8.97 MB - Last synced at: 28 days ago - Pushed at: over 4 years ago - Stars: 29 - Forks: 11

wujun728/jun_bigdata

jun_bigdata大数据平台常用服务。内含大数据组件常用的功能及demo,包括hadoop、hbase、hive、kafka等等,实现了Spark SQL对Redis、MongoDB等非关系型数据库的数据的读写等等

Language: JavaScript - Size: 10.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 27 - Forks: 18

mbajer42/ucz-dfs

A distributed file system written in Rust.

Language: Rust - Size: 76.2 KB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 27 - Forks: 2

DICL/VeloxDFS

DHT-based Distributed File System for MapReduce Jobs

Language: C++ - Size: 16.3 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 26 - Forks: 4

stormsinbrewing/Real_Time_Social_Media_Mining

DevOps pipeline for Real Time Social/Web Mining

Language: HTML - Size: 30.5 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 25 - Forks: 10

BeardedManZhao/dataTear

Split into data blocks,In this format, efficient reading can be realized,Avoid unnecessary data reading operations.

Language: Java - Size: 2.07 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 25 - Forks: 1

onanypoint/yandex-big-data-engineering 📦

Language: Jupyter Notebook - Size: 458 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 25 - Forks: 39