An open API service providing repository metadata for many open source software ecosystems.

Topic: "hadoop-hdfs"

seaweedfs/seaweedfs

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

Language: Go - Size: 69 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 24,313 - Forks: 2,392

OBenner/data-engineering-interview-questions

More than 2000+ Data engineer interview questions.

Size: 938 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 1,303 - Forks: 465

Morphl-AI/MorphL-Community-Edition

MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization

Language: Python - Size: 143 KB - Last synced at: 6 days ago - Pushed at: over 5 years ago - Stars: 261 - Forks: 29

linkedin/dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.

Language: Java - Size: 297 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 129 - Forks: 36

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka

Data Engineering Project with Hadoop HDFS and Kafka

Language: Python - Size: 3.46 MB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 102 - Forks: 25

groda/big_data

Tutorials on Big Data essentials: Hadoop, MapReduce, Spark. Explore a variety of tutorials and demonstrations on Big Data technologies, primarily in the form of Jupyter notebooks. Most notebooks are self-contained and live—ready to run with a click.

Language: Jupyter Notebook - Size: 51.9 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 75 - Forks: 26

IBM/sparksql-for-hbase

Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers

Size: 614 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 69 - Forks: 27

vim89/datapipelines-essentials-python

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Language: Python - Size: 1.76 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 53 - Forks: 34

maniram-yadav/Big_DataHadoop_Projects

Big data projects implemented by Maniram yadav

Language: PigLatin - Size: 2.79 MB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 33 - Forks: 33

hundredlabs/console 📦

Open source data infrastructure platform. Designed for developers, built for speed.

Language: TypeScript - Size: 22.6 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 23 - Forks: 4

hokstack/hok-helm

HokStack - Run Hadoop Stack on Kubernetes

Language: Shell - Size: 3.88 MB - Last synced at: 10 months ago - Pushed at: about 5 years ago - Stars: 22 - Forks: 6

hadoop-sandbox/hadoop-sandbox

A fully-functional Hadoop Yarn cluster as docker-compose deployment.

Language: Shell - Size: 103 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 16 - Forks: 5

jarlor/TravelWebsite_BigDataAnalysis

旅游网站(携程网部分数据)大数据分析-hadoop课程设计(本科课设级别)

Language: Java - Size: 639 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 1

PChou/marayarn

Marathon on yarn

Language: Java - Size: 1.64 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 7

lucas91batista/twitter-hashtag-graph

Twitter + Flume + Hadoop (HDFS, MapReduce) + Neo4j + Pyhton

Language: JavaScript - Size: 2.61 MB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 13 - Forks: 0

alagrede/HdfsClient

A Java Hdfs client example and full Kerberos example for call hadoop commands directly in java code or on your local machine.

Language: Java - Size: 11.7 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 13 - Forks: 8

waltherg/distributable_docker_sql_on_hadoop

Toy Hadoop cluster combining various SQL-on-Hadoop variants

Language: Shell - Size: 88.9 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 11 - Forks: 4

Mahmoud-nfz/football-big-data

This is a comprehensive solution for real-time football analytics, leveraging Apache Spark execution on yarn for both streaming and batch processing, Hadoop HDFS for distributed storage, Kafka for real-time data ingestion, rethinkdb for live data updates , a custom built search engine and Next.js for data visualization.

Language: TypeScript - Size: 5.92 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 8 - Forks: 2

Areesha-Tahir/Hadoop-MapReduce-Sentiment-Analysis-Through-Keywords

A MapReduce program to conduct sentiment analysis of a keyword from a list of comments.

Language: Java - Size: 38.1 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 8 - Forks: 0

jodth07/hadoop-installation

Instructions on setting up Hadoop, HDFS, java, sbt, kafka, scala, spark and flume on Ubuntu 18.04

Language: Shell - Size: 61.5 KB - Last synced at: 9 months ago - Pushed at: almost 4 years ago - Stars: 8 - Forks: 15

mgarralda/hadoop-spark-cluster

Repository containing Docker images for create a cluster Spark on Hadoop Yarn.

Language: Dockerfile - Size: 286 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 7 - Forks: 3

leibniz21c/mammoth

Mammoth is a container based hadoop distributed system log analyzer. Sponsed by Mantech and Naver Cloud Platform.

Language: Dart - Size: 31.8 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 7 - Forks: 5

Ren294/Covid-Data-Process

This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.

Language: Shell - Size: 6.22 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 6 - Forks: 0

SepehrImanian/ansible-hadoop-hdfs

Ansible Playbook For Setup Hadoop HDFS

Language: Jinja - Size: 27.3 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 0

pfisterer/apache-hadoop-helm Fork of mgit-at/helm-hadoop-3

Helm chart for Apache Hadoop using multi-arch docker images

Language: Dockerfile - Size: 104 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 6 - Forks: 6

HxnDev/Hadoop-MapReduce-to-Analyze-Sentiment-of-Keyword

In this task, we had to write a MapReduce program to analyze the sentiment of a keyword from a list of comments. This was done using Hadoop HDFS.

Language: Java - Size: 1000 KB - Last synced at: 7 days ago - Pushed at: almost 4 years ago - Stars: 6 - Forks: 0

aadishgoel/Hadoop-Codes

Neat and Handy Place for all Hadoop codes

Language: Java - Size: 25.4 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 6 - Forks: 3

hadoop-sandbox/hadoop-sandbox-images

Docker image builds for Hadoop sandbox.

Language: Dockerfile - Size: 64.5 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 5 - Forks: 4

berksudan/Distributed-Environment-Installation-Guide

Install Hadoop, HDFS, Yarn and Spark on 3 Ubuntu 18.04 Machines

Size: 3.66 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 5 - Forks: 0

Ren294/Log-Analysis-Project

This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.

Language: Python - Size: 2.88 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 5 - Forks: 1

karamolegkos/Twitter_Data_Analyzer

Language: Java - Size: 388 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

LMAPcoder/Hadoop-on-Colab

Installation and configuration of Hadoop on Google Colaboratory

Language: Jupyter Notebook - Size: 620 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 5

HxnDev/Finding-Average-Temperature-of-Each-Year-using-Hadoop-HDFS

In this task, we had to calculate the average temperature for each year from the given dataset using Hadoop HDFS. We had to create a MapReduce function to perform this task.

Language: Java - Size: 451 KB - Last synced at: 7 days ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 0

prabal03/python-automation-in-linux

Python automation in linux

Language: Python - Size: 16.1 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 1

sihamhafsi/projet-big-data_analyse-des-donnees-youtube

Language: Java - Size: 5.21 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 0

prithvianilk/rdfs

An attempt to make a reliable, distributed file system inspired by HDFS

Language: Java - Size: 437 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 2

pjqdyd/Hadoop-demo

Hadoop hdfs mapreduce hive spark使用案例

Language: Java - Size: 52.7 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 1

waikeungt/hdfs-spring-boot-starter

用于spring boot快捷使用HDFS的starter

Language: Java - Size: 76.2 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 0

HxnDev/Hadoop-MapReduce-to-Find-Average-Length-of-Comments

In this task, we had to find the average length of comments given in the dataset. It was done using Hadoop MapReduce and Hadoop HDFS.

Language: Java - Size: 675 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 4 - Forks: 1

MarceloJSSantos/acelereracao-global-dev-4-everis-dio

Repositório criado para armazenar anotações e atividades desempenhadas no treinamento na plataforma da Digital Inovattion One (DIO) para o Processo seletivo de Engenheiros de Dados pela empresa Everis.

Language: Jupyter Notebook - Size: 57.3 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 4 - Forks: 2

federicopfund/data-engineer

Proceso ETL

Language: Jupyter Notebook - Size: 84.5 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

JohnLyonX/hadoop

Hadoop Configuration

Language: Shell - Size: 38.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

thedatasociety/lab-hadoop

Language: PLpgSQL - Size: 4.6 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 7

MarwanMashra/Hadoop-MapReduce

Map/Reduce project with Hadoop

Language: Python - Size: 1.11 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 0

SarahAyaz/YouTube_Data_Analysis

Analysis of YouTube Data using Hadoop Mapreduce framework in Java.

Language: Java - Size: 24.5 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 2

sloopstash/kickstart-hadoop

The ultimate aim of this Hadoop starter-kit Git repository is to help you deploy and manage Hadoop ecosystem components on AWS cloud using Docker, Kubernetes, and Chef.

Language: Ruby - Size: 150 KB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 7

Areesha-Tahir/Hadoop-MapReduce-To-Find-Average-Length-Of-Comments

A MapReduce program to calculate the average length of comments.

Language: Java - Size: 9.77 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 0

benjdiasaad/MapReduce_K-means

Implémentation de l'algorithme de clustering k-means en utilisant le framework Hadoop version 3.1.3 (MapReduce).

Language: Java - Size: 32.2 KB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 2

jomarsilio/Bootcamp-IGTI-Analista-de-Dados

Bootcamp ministrado pela IGTI com o objetivo de abordar de forma intensiva conceitos e práticas da análise de dados, habilitando o aluno para atuar profissionalmente na área.

Language: Jupyter Notebook - Size: 127 MB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 0

rishabmenon/YouTube-Data-Analysis-Hadoop

This Hadoop project involves analysing the YouTube dataset to solve a few problem statements.

Size: 1.75 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 3 - Forks: 5

viveknigam3003/hadoop-linux-setup

Python scripts to assist setting up Hadoop v1 in Linux and starting a NameNode, DataNodes and Client.

Language: Python - Size: 80.1 KB - Last synced at: 12 months ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 3

elaaatif/JPEG-and-JPEG2000-compression-on-Multi-node-cluster-using-hadoop-and-spark

Big Data technologies can be leveraged for efficient, distributed image compression using JPEG2000 (Spark) and JPEG (MapReduce).

Size: 14.3 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

jprakashkce/Olympic_Participants-Analysis

Analysis of Olympic Participants dataset using Hadoop Map Reduce.

Size: 27.3 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

29DCH/Hadoop-HDFS-MapReduce-Examples

Java API操作HDFS文件、基于MapReduce的词频统计程序及其重构、MapReduce编程之Combiner、Partitioner组件应用

Language: Java - Size: 35.2 KB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 1

rihemebh/bigdata-pipeline

a simple bigdata pipeline using hadoop, spark, kafka and hbase

Language: Java - Size: 174 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

Akilankm/Hadoop-Installation

The repo contains the steps for setting up the single node cluster in Hadoop 3.2.1 in Ubuntu 20.04 LTS

Size: 8.79 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

mikeroyal/Apache-Hadoop-Guide

Apache Hadoop Guide

Size: 141 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 2

sunnywalden/es-hadoop-data-share

ES Hadoop数据双向读写 share data between es and hadoop base on ES-Hadoop

Language: Java - Size: 118 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 2

joyeetadey/image-classification-using-distributed-SVM

This is a project in Large scale data processing to classify the damaged and non-damaged cars using distributed SVM in Pyspark and Hadoop

Language: Jupyter Notebook - Size: 34.4 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 1

AnkitaSinha98/SocialMedia-Analysis

Big Data is Stored and analyzed from various Social Media like Twitter, Facebook, Instagram, etc using Hadoop and Pig and all details are analyzed then result are given.This result is very useful for companies and for strategy planning and decision making.

Size: 7.05 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

kapilthakre/Bicycle-Sharing-Demand-Forecasting-Using-Spark-Scala

In this project, we are going to build a Bicycle sharing demand prediction service using Apache Spark and Scala. I have created a two spark application one for model generation and another for model demand prediction.

Language: Scala - Size: 295 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

benjdiasaad/MapRedcuce_Analyse_vente

Création d'un programme Hadoop Java : Analyse de ventes.

Language: Java - Size: 28.3 KB - Last synced at: 2 months ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

benjdiasaad/MapReduce_WordCount

Création d'un programme Hadoop Java : compteur d’occurrence de mots. Si vous souhaitez compiler manuellement le code sur la machine virtuelle Hadoop, vous devrez y copier ce code dans la VM

Language: Java - Size: 11.7 KB - Last synced at: 2 months ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

pranay1603/Linux-Automation

Linux-Automation

Language: Python - Size: 51.8 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

TrentBrunson/Big_Data

Apache Hadoop: HDFS, MapReduce, YARN, NLP, AWS, Spark, Google Colab, PySpark

Language: Jupyter Notebook - Size: 109 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

YiranJing/AWS-BigData-application

Projects using S3, Amazon SageMaker, AWS Lambda Function, Amazon Forecast; Projects related to SQL, Hadoop, Flink (Java), and Google Map API (Jun 2019 - Jul 2019)

Language: Jupyter Notebook - Size: 41 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 2

alex-ber/docker-hive Fork of ops-guru/docker-hive

EMR 5.25.0 cluster single node Hadoop docker image. With Amazon Linux, Hadoop 2.8.5 and Hive 2.3.5

Language: Shell - Size: 45.9 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

gurpreetsahni/Data-Fetching-using-Flume

In this project we will fetch tweets using Apache Flume. We will also use the memory channel to buffer these tweets and HDFS sink to push these tweets into the HDFS.

Size: 4.87 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

rishabmenon/Airlines-Analysis-Hadoop

This Hadoop project involves analysing the airline datasets to solve a few problem statements.

Size: 2.22 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 5

arkady-emelyanov/hadoop-playground 📦

🐘Yet another Hadoop playground

Language: Shell - Size: 49.8 KB - Last synced at: about 1 month ago - Pushed at: almost 7 years ago - Stars: 2 - Forks: 1

nbfujx/hadoop-learn-demo

Language: Python - Size: 32.2 KB - Last synced at: 3 months ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 2

vineet-k09/E-Book-Recommendation

E-Book Recommendation project based on hadoop and react with spark

Language: JavaScript - Size: 57.6 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

KeerthanaJ-rec/210701118-CS19P16-DA-Lab

Data Analytics Laboratory

Language: R - Size: 23.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

chouaib-629/MovieRecommendation

A Hadoop-based Movie Recommendation System using the MovieLens dataset, demonstrating MapReduce for sorting and processing movie ratings.

Language: Java - Size: 320 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

HabibAroua/Newspaper-analysis

Language: Java - Size: 12.5 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 1

divithraju/divith-raju-pipeline-hadoop-pyspark

This project presents a comprehensive data pipeline designed to predict customer churn using historical customer data. By leveraging Hadoop and PySpark, this pipeline efficiently processes large datasets, performs feature engineering, and trains a machine learning model to identify customers at risk of leaving.

Language: Python - Size: 4.88 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

IliesChibane/Projet-IoT-Cloud-BigData

Implémentation d'une pipeline permettant de faire la prédiction de la maladie de parkinson via des outils d'IoT, Cloud, et Big Data

Language: Python - Size: 891 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Lakshmiec/Big-Data-Sentiment-Analysis-of-Amazon-Reviews-for-Seller-and-Brand-Empowerment

Language: Jupyter Notebook - Size: 1.53 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

abroniewski/IdleCompute-Data-Management-Architecture

Implementation of a big data management and analysis backbone architecture using PySpark for distributed and scalable data ingestion and MLlib for machine learning analysis. Part of Big Data Management and Analytics (BDMA) program.

Language: Jupyter Notebook - Size: 34.8 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

aymane-maghouti/Mobile-Data-Hive-Insights

This project demonstrates the process of extracting data from a MySQL database, transferring it using Apache Sqoop, storing it in Hive Data warehouse (the data actually is store in Hadoop Distributed File System (HDFS)), and performing analysis using Hive Query Language (Hive QL) (it is a language close to SQL). Then visualize the data in Power BI,

Language: HiveQL - Size: 691 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Luissalazarsalinas/Avocado-Yield-Prediction

Freelancer Project - Batch processing data pipeline and machine learning application.

Language: Python - Size: 3.53 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

ddrEricNo1/ds_project

This is my distributed system final project.

Language: Java - Size: 345 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

aquib-sh/setup-hadoop

A BASH script to setup Apache Hadoop and Apache Hive with Derby database on Debian GNU/Linux

Language: Shell - Size: 37.1 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Karthik-SK12345/6THSEM-BDA-1BM19CS070-KARTHIK.S

This repository will be used to upload all my files and output documents that is during the course of "Big Data Analytics".

Size: 20.4 MB - Last synced at: 6 months ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

fbraza/scala-dfs-lib

DFS-Lib is a scala flavoured api to the Hadoop java filesystem api

Language: Scala - Size: 75.2 KB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

ManikHossain08/Bixi-Cloud-ETL-Data-Pipeline-using-Scala-Hive-AWS_Athena_JDBC-Driver

An Automated ETL Data pipeline which extract complex json data from web API service (GBFS-bixi Data) and convert to CSV for loading into Data-warehouse HDFS. After-that, Hive will process the further by external and managed table. Same procedure is also applied with AWS S3 and Athena.

Language: Scala - Size: 117 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

mikeroyal/Apache-Pig-Guide

Apache Pig Guide

Size: 444 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

sxd-big-data/bigdata

This project contains springboot,mysql,mybatis,kettle,hadoop,hive,spark.

Language: Java - Size: 106 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

everthonnreis/hadoop-spark-install-shell-script

Script for installing a standalone hadoop and spark environment

Language: Python - Size: 87.9 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

BurraAbhishek/Python_Hadoop_MapReduce_MarketBasketAnalysis

Market Basket Analysis using Hadoop MapReduce in Python

Language: Python - Size: 103 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 2

KingJin-web/Hadoop

hadoop-hdfs 以及 mapreduce 学习

Language: Java - Size: 7.56 MB - Last synced at: 11 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 1

Zaaim-Halim/Hadoop-Mapreduce

hadoop mapreduce problems using hadoop version 3.3.0

Language: JavaScript - Size: 22.9 MB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

raktim00/Hadoop-HDFS-MR-Multi-Node-Cluster-AWS-Ansible

Provisioning EC2 Instances & then setting up Hadoop Multi Node Storage (HDFS) & Compute (MR) Cluster on them using Ansible Automation

Language: Jinja - Size: 22.5 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 2

humanbeeng/hadoop-auto-install

A small helper script that can save your valuable time during installation of Apache Hadoop.

Language: Shell - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

GarlGuo/LibraryDatacenter

This is my group project for creating a distributed library datacenter on MongoDB, Redis, and Hadoop Distributed File System for Distributed Database Systems course on my study-away program in Tsinghua University

Language: Python - Size: 4.87 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

nermiin/ApacheHadoop-BigData-AmazonCustomerReviewsSystem

Hadoop, mapreduce, big vs teknolojiler ile big data uygulaması geliştirilmiştir.

Language: Java - Size: 120 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

Trisha11r/covid_data_analysis_mapreduce

COVID-19 data analysis with MapReduce

Language: Java - Size: 8.79 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

Sahith-8055/Big_Data_Technologies

Language: Java - Size: 35.7 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

anshul1004/MutualFriends

Implementation of Hadoop and Spark

Language: Java - Size: 23 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

lucasmior/hadoop-vm

Virtual Machine with Hadoop environment setup and ready to run map-reduce applications

Language: Shell - Size: 4.88 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 1