An open API service providing repository metadata for many open source software ecosystems.

Topic: "hadoop"

donnemartin/data-science-ipython-notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Language: Python - Size: 46.8 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 28,169 - Forks: 7,978

spotify/luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Language: Python - Size: 10.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 18,279 - Forks: 2,418

Tencent/APIJSON

🏆 实时 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构 🏆 Real-Time coding-free, powerful and secure ORM 🚀 providing APIs and Docs without coding by Backend, and the returned JSON of API can be customized by Frontend(Client) users

Language: Java - Size: 69.8 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 17,973 - Forks: 2,230

heibaiying/BigData-Notes

大数据入门指南 :star:

Language: Java - Size: 22.9 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 16,422 - Forks: 4,279

prestodb/presto

The official home of the Presto distributed SQL query engine for big data

Language: Java - Size: 232 MB - Last synced at: about 12 hours ago - Pushed at: about 13 hours ago - Stars: 16,388 - Forks: 5,469

apache/hadoop

Apache Hadoop

Language: Java - Size: 567 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 15,127 - Forks: 9,059

deeplearning4j/deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...

Language: Java - Size: 728 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 14,023 - Forks: 3,851

trinodb/trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Language: Java - Size: 262 MB - Last synced at: about 21 hours ago - Pushed at: about 21 hours ago - Stars: 11,485 - Forks: 3,238

wangzhiwubigdata/God-Of-BigData

专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Size: 66.3 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 10,106 - Forks: 3,212

linkedin/school-of-sre

At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.

Language: HTML - Size: 49.5 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 7,966 - Forks: 725

h2oai/h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Language: Jupyter Notebook - Size: 597 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 7,204 - Forks: 2,024

Alluxio/alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

Language: Java - Size: 196 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 6,990 - Forks: 2,943

HariSekhon/DevOps-Bash-tools

1000+ DevOps Bash Scripts - AWS, GCP, Kubernetes, Docker, CI/CD, APIs, SQL, PostgreSQL, MySQL, Hive, Impala, Kafka, Hadoop, Jenkins, GitHub, GitLab, BitBucket, Azure DevOps, TeamCity, Spotify, MP3, LDAP, Code/Build Linting, pkg mgmt for Linux, Mac, Python, Perl, Ruby, NodeJS, Golang, Advanced dotfiles: .bashrc, .vimrc, .gitconfig, .screenrc, tmux..

Language: Shell - Size: 11.2 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 6,769 - Forks: 1,265

apache/hive

Apache Hive

Language: Java - Size: 704 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 5,727 - Forks: 4,750

apache/ignite

Apache Ignite

Language: Java - Size: 444 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 4,944 - Forks: 1,923

apache/calcite

Apache Calcite

Language: Java - Size: 102 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4,867 - Forks: 2,427

tomwhite/hadoop-book

Example source code accompanying O'Reilly's "Hadoop: The Definitive Guide" by Tom White

Language: Makefile - Size: 2.54 MB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 3,506 - Forks: 2,566

WeBankFinTech/DataSphereStudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Language: Java - Size: 243 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 3,166 - Forks: 1,020

apache/nutch

Apache Nutch is an extensible and scalable web crawler

Language: Java - Size: 132 MB - Last synced at: 4 days ago - Pushed at: 3 months ago - Stars: 3,036 - Forks: 1,257

MoRan1607/BigDataGuide

大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料

Size: 154 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 2,958 - Forks: 903

LuckyZXL2016/Movie_Recommend

基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统

Language: Java - Size: 55.1 MB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 2,936 - Forks: 1,049

geekyouth/SZT-bigdata

深圳地铁大数据客流分析系统🚇🚄🌟

Language: Scala - Size: 42.1 MB - Last synced at: 28 days ago - Pushed at: about 1 year ago - Stars: 2,364 - Forks: 610

big-data-europe/docker-hadoop

Apache Hadoop docker image

Language: Shell - Size: 109 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 2,261 - Forks: 1,360

apache/kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Language: Scala - Size: 60.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2,207 - Forks: 947

dahuoyzs/javapdf

🍣100本 Java电子书 技术书籍PDF(以下载阅读为荣,以点赞收藏为耻)

Size: 18.6 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 2,098 - Forks: 475

cdarlint/winutils

winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows

Language: Shell - Size: 7.45 MB - Last synced at: 29 days ago - Pushed at: about 1 year ago - Stars: 2,069 - Forks: 2,233

apache/drill

Apache Drill is a distributed MPP query layer for self describing data

Language: Java - Size: 68 MB - Last synced at: 3 days ago - Pushed at: 10 days ago - Stars: 1,975 - Forks: 983

gchq/Gaffer

A large-scale entity and relation database supporting aggregation of properties

Language: Java - Size: 218 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1,786 - Forks: 361

Qihoo360/hbox

AI on Hadoop

Language: Java - Size: 126 MB - Last synced at: 1 day ago - Pushed at: 16 days ago - Stars: 1,732 - Forks: 386

water8394/BigData-Interview

:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Size: 6.59 MB - Last synced at: about 19 hours ago - Pushed at: almost 4 years ago - Stars: 1,622 - Forks: 447

collabH/bigdata-growth

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

Language: Shell - Size: 221 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1,612 - Forks: 375

apache/carbondata

High performance data store solution

Language: Scala - Size: 82.6 MB - Last synced at: about 2 hours ago - Pushed at: 3 months ago - Stars: 1,435 - Forks: 705

HariSekhon/Dockerfiles

50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Jenkins, TeamCity, Alpine, CentOS, Debian, Fedora, Ubuntu, Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak

Language: Shell - Size: 7.73 MB - Last synced at: 29 days ago - Pushed at: 3 months ago - Stars: 1,347 - Forks: 473

OBenner/data-engineering-interview-questions

More than 2000+ Data engineer interview questions.

Size: 938 KB - Last synced at: 29 days ago - Pushed at: 5 months ago - Stars: 1,341 - Forks: 477

wgzhao/Addax

A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL seamlessly

Language: Java - Size: 45.2 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,289 - Forks: 313

DTStack/Taier

Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display

Language: Java - Size: 148 MB - Last synced at: 10 days ago - Pushed at: 11 months ago - Stars: 1,250 - Forks: 340

HariSekhon/Nagios-Plugins

450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...

Language: Python - Size: 8.83 MB - Last synced at: 29 days ago - Pushed at: 3 months ago - Stars: 1,145 - Forks: 509

Teradata/kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Language: Java - Size: 84.3 MB - Last synced at: 28 days ago - Pushed at: over 2 years ago - Stars: 1,113 - Forks: 574

oeljeklaus-you/UserActionAnalyzePlatform

电商用户行为分析大数据平台

Language: Java - Size: 1.26 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 1,029 - Forks: 386

apache/ozone

Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.

Language: Java - Size: 104 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 937 - Forks: 545

realguoshuai/hadoop_study

定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)

Language: Java - Size: 209 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 896 - Forks: 257

HariSekhon/DevOps-Python-tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Language: Python - Size: 3.11 MB - Last synced at: 13 days ago - Pushed at: 2 months ago - Stars: 798 - Forks: 348

tony-framework/TonY

TonY is a framework to natively run deep learning frameworks on Apache Hadoop.

Language: Java - Size: 3.32 MB - Last synced at: about 23 hours ago - Pushed at: over 1 year ago - Stars: 706 - Forks: 163

WeBankFinTech/WeDataSphere

WeDataSphere is a financial grade, one-stop big data platform suite.

Size: 8.79 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 666 - Forks: 163

sunnyandgood/BigData

💎🔥大数据学习笔记

Language: Java - Size: 316 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 647 - Forks: 222

AbsaOSS/spline

Data Lineage Tracking And Visualization Solution

Language: Scala - Size: 8.44 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 632 - Forks: 159

cerndb/dist-keras 📦

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

Language: Python - Size: 54.6 MB - Last synced at: 5 days ago - Pushed at: almost 7 years ago - Stars: 623 - Forks: 167

linkedin/venice

Venice, Derived Data Platform for Planet-Scale Workloads.

Language: Java - Size: 55.1 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 546 - Forks: 98

Esri/gis-tools-for-hadoop

The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.

Size: 18 MB - Last synced at: 17 days ago - Pushed at: about 3 years ago - Stars: 522 - Forks: 255

PriyankaJhaTheDeveloper/DataAnalystPortfolioProjects

This repository contains my Data Analytics portfolio projects ranging from SQL, Python, Tableau, Excel, and Hadoop (HiveQL).

Language: Jupyter Notebook - Size: 26.1 MB - Last synced at: 13 days ago - Pushed at: over 3 years ago - Stars: 498 - Forks: 88

apache/tez

Apache Tez

Language: Java - Size: 29.4 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 497 - Forks: 433

Raray-chuan/xichuan_note

xichuan的学习总结笔记,覆盖了java、spring、java其他常用框架,以及大数据相关组件等📚

Language: Java - Size: 10.9 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 490 - Forks: 95

Netflix/iceberg

Iceberg is a table format for large, slow-moving tabular data

Language: Java - Size: 2.41 MB - Last synced at: 19 days ago - Pushed at: about 2 years ago - Stars: 479 - Forks: 60

uber/marmaray 📦

Generic Data Ingestion & Dispersal Library for Hadoop

Language: Java - Size: 1.61 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 478 - Forks: 111

houshanren/big_data_architect_skills

一个大数据架构师应该掌握的技能

Size: 38 MB - Last synced at: about 1 month ago - Pushed at: almost 6 years ago - Stars: 474 - Forks: 174

dromara/CloudEon

CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.

Language: FreeMarker - Size: 55.4 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 464 - Forks: 113

ittqqzz/ECommerceRecommendSystem

电商大数据商品实时推荐系统,Vue + TypeScript + ElementUI+ Spring + Spark

Language: Java - Size: 4.39 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 463 - Forks: 115

fabiogjardim/bigdata_docker

Big Data Ecosystem Docker

Language: VBA - Size: 126 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 416 - Forks: 322

whoiszxl/shopzz

一个使用SpringCloud Alibaba开发的电商项目,移动端使用Flutter2.x构建,小程序使用uni-app构建,管理后台则使用Vue 3.0 + Element Plus 进行构建,并在支付上接入数字货币(比特币、以太坊UDST)支付,后端采用Hadoop与Flink等大数据框架构建实时计算与离线计算体系。

Language: Java - Size: 18.2 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 400 - Forks: 85

cubefs/compass

Compass is a task diagnosis platform for bigdata

Language: Java - Size: 5.92 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 385 - Forks: 139

fancyChuan/bigdata-hub

数据建设与大数据技术知识体系,包含hadoop、hive、spark、flink主流框架和系列框架,数据中台、数据湖、数据治理、数仓建设、数据化转型等

Language: Java - Size: 179 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 361 - Forks: 102

hortonworks/cloudbreak

CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.

Language: Java - Size: 221 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 358 - Forks: 238

cwensel/cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.

Language: Java - Size: 32.1 MB - Last synced at: 28 days ago - Pushed at: 3 months ago - Stars: 350 - Forks: 221

Tencent/caelus

Set of Kubernetes solutions for reusing idle resources of nodes by running extra batch jobs

Language: Go - Size: 1.03 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 350 - Forks: 85

kanyun-inc/ytk-learn

Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).

Language: Java - Size: 804 KB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 348 - Forks: 76

tirthajyoti/Spark-with-Python

Fundamentals of Spark with Python (using PySpark), code examples

Language: Jupyter Notebook - Size: 8.97 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 347 - Forks: 271

elasticluster/elasticluster

Create clusters of VMs on the cloud and configure them with Ansible.

Language: Python - Size: 5.7 MB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 335 - Forks: 150

Cascading/cascading Fork of cwensel/cascading

All development now happens over here: https://github.com/cwensel/cascading. Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on various cluster computing platforms.

Language: Java - Size: 17.5 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 332 - Forks: 113

HuQi2018/BiSheServer

本系统是我的毕业设计项目,题目为“基于用户画像的电影推荐系统的设计与实现”。主要是以Django作为基础框架,采用MTV模式,数据库使用MongoDB、MySQL和Redis,以从豆瓣平台爬取的电影数据作为基础数据源,主要基于用户的基本信息和使用操作记录等行为信息来开发用户标签,并使用Hadoop、Spark大数据组件进行分析和处理的推荐系统。管理系统使用的是Django自带的管理系统,并使用simpleui进行了美化。

Language: Python - Size: 20.7 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 324 - Forks: 33

datawhalechina/juicy-bigdata

🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

Language: Python - Size: 27.4 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 313 - Forks: 43

sakserv/hadoop-mini-clusters

hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE

Language: Java - Size: 212 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 292 - Forks: 105

mjakubowski84/parquet4s

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

Language: Scala - Size: 2.33 MB - Last synced at: 18 days ago - Pushed at: about 1 month ago - Stars: 291 - Forks: 67

MeetYouDevs/big-whale

Spark、Flink等离线任务的调度以及实时任务的监控

Language: Java - Size: 4.8 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 289 - Forks: 117

florent37/Android-NoSql 📦

Lightweight, simple structured NoSQL database for Android

Language: Java - Size: 169 KB - Last synced at: 5 months ago - Pushed at: almost 5 years ago - Stars: 287 - Forks: 40

GoogleCloudDataproc/hadoop-connectors

Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.

Language: Java - Size: 11.3 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 285 - Forks: 255

DigitalPebble/behemoth 📦

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Language: Java - Size: 7.45 MB - Last synced at: 7 months ago - Pushed at: about 7 years ago - Stars: 281 - Forks: 60

fraugster/parquet-go

Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.

Language: Go - Size: 1.2 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 279 - Forks: 53

brndnmtthws/facebook-hive-udfs

Facebook's Hive UDFs

Language: Java - Size: 131 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 271 - Forks: 151

timveil/hive-jdbc-uber-jar

Hive JDBC "uber" or "standalone" jar based on the latest Apache Hive version

Language: Java - Size: 3.79 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 265 - Forks: 96

wavestone-cdt/hadoop-attack-library

A collection of pentest tools and resources targeting Hadoop environments

Language: Python - Size: 65.8 MB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 265 - Forks: 66

apache/calcite-avatica

Apache Calcite Avatica

Language: Java - Size: 32 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 260 - Forks: 233

HariSekhon/HAProxy-configs

80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Kubernetes, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.

Language: Shell - Size: 623 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 252 - Forks: 81

jasonTangxd/recommendSys

推荐项目(实时推荐和离线推荐)

Language: Java - Size: 2.05 MB - Last synced at: about 2 months ago - Pushed at: over 7 years ago - Stars: 252 - Forks: 115

ShifuML/shifu

An end-to-end machine learning and data mining framework on Hadoop

Language: Java - Size: 16.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 251 - Forks: 108

oeljeklaus-you/JavaOrBigData-Interview

Java开发者或者大数据开发者面试知识点整理

Size: 66 MB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 251 - Forks: 64

Mellanox/SparkRDMA 📦

This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvidia/sparkucx

Language: Java - Size: 259 KB - Last synced at: 5 months ago - Pushed at: about 6 years ago - Stars: 242 - Forks: 71

huangfox/dpkb

大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse

Size: 81.1 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 229 - Forks: 60

apache/incubator-wayang

Apache Wayang(incubating) is the first cross-platform data processing system.

Language: Java - Size: 19.1 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 222 - Forks: 96

Chabane/bigdata-playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Language: TypeScript - Size: 3.08 MB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 209 - Forks: 74

isxcode/spark-yun

Big data computing platform based on Spark <至轻云-超轻量级大数据计算平台/数据中心/主数据>

Language: Java - Size: 5.97 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 207 - Forks: 55

apache/hadoop-hdfs 📦

Mirror of Apache Hadoop HDFS

Language: Java - Size: 34.5 MB - Last synced at: 3 days ago - Pushed at: over 6 years ago - Stars: 199 - Forks: 115

lynnlangit/learning-hadoop-and-spark

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

Language: HTML - Size: 13.6 MB - Last synced at: 7 days ago - Pushed at: 7 months ago - Stars: 195 - Forks: 165

HariSekhon/Knowledge-Base

Large Tech Knowledge Base from 20 years in DevOps, Linux, Cloud, Big Data, AWS, GCP etc - gradually porting my large private knowledge base to public

Language: Shell - Size: 183 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 180 - Forks: 34

dsaidgovsg/airflow-pipeline

An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR

Language: Python - Size: 146 KB - Last synced at: 3 days ago - Pushed at: 29 days ago - Stars: 174 - Forks: 58

aliyun/aliyun-emapreduce-datasources

Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.

Language: Scala - Size: 191 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 167 - Forks: 88

commoncrawl/cc-mrjob Fork of Smerity/cc-mrjob

Demonstration of using Python to process the Common Crawl dataset with the mrjob framework

Language: Python - Size: 1020 KB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 166 - Forks: 65

HY-ZhengWei/HBaseClient

HBase客户端数据管理软件

Language: Java - Size: 344 MB - Last synced at: 3 days ago - Pushed at: over 6 years ago - Stars: 160 - Forks: 69

nielsbasjes/logparser

Easy parsing of Apache HTTPD and NGINX access logs with Java, Hadoop, Hive, Flink, Beam, Storm, Drill, ...

Language: Java - Size: 2.8 MB - Last synced at: 7 days ago - Pushed at: 11 days ago - Stars: 159 - Forks: 42

apache/hadoop-common 📦

Mirror of Apache Hadoop common

Language: Java - Size: 258 MB - Last synced at: 3 days ago - Pushed at: over 5 years ago - Stars: 159 - Forks: 152

marcelmay/hadoop-hdfs-fsimage-exporter

Exports Hadoop HDFS content statistics to Prometheus

Language: Java - Size: 552 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 155 - Forks: 47