Topic: "hadoop-hdfs"
seaweedfs/seaweedfs
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
Language: Go - Size: 69 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 24,313 - Forks: 2,392

OBenner/data-engineering-interview-questions
More than 2000+ Data engineer interview questions.
Size: 938 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 1,303 - Forks: 465

Morphl-AI/MorphL-Community-Edition
MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
Language: Python - Size: 143 KB - Last synced at: 6 days ago - Pushed at: over 5 years ago - Stars: 261 - Forks: 29

linkedin/dynamometer
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Language: Java - Size: 297 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 129 - Forks: 36

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka
Data Engineering Project with Hadoop HDFS and Kafka
Language: Python - Size: 3.46 MB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 102 - Forks: 25

groda/big_data
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark. Explore a variety of tutorials and demonstrations on Big Data technologies, primarily in the form of Jupyter notebooks. Most notebooks are self-contained and live—ready to run with a click.
Language: Jupyter Notebook - Size: 51.9 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 75 - Forks: 26

IBM/sparksql-for-hbase
Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers
Size: 614 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 69 - Forks: 27

vim89/datapipelines-essentials-python
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Language: Python - Size: 1.76 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 53 - Forks: 34

maniram-yadav/Big_DataHadoop_Projects
Big data projects implemented by Maniram yadav
Language: PigLatin - Size: 2.79 MB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 33 - Forks: 33

hundredlabs/console 📦
Open source data infrastructure platform. Designed for developers, built for speed.
Language: TypeScript - Size: 22.6 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 23 - Forks: 4

hokstack/hok-helm
HokStack - Run Hadoop Stack on Kubernetes
Language: Shell - Size: 3.88 MB - Last synced at: 10 months ago - Pushed at: about 5 years ago - Stars: 22 - Forks: 6

hadoop-sandbox/hadoop-sandbox
A fully-functional Hadoop Yarn cluster as docker-compose deployment.
Language: Shell - Size: 103 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 16 - Forks: 5

jarlor/TravelWebsite_BigDataAnalysis
旅游网站(携程网部分数据)大数据分析-hadoop课程设计(本科课设级别)
Language: Java - Size: 639 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 1

PChou/marayarn
Marathon on yarn
Language: Java - Size: 1.64 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 7

lucas91batista/twitter-hashtag-graph
Twitter + Flume + Hadoop (HDFS, MapReduce) + Neo4j + Pyhton
Language: JavaScript - Size: 2.61 MB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 13 - Forks: 0

alagrede/HdfsClient
A Java Hdfs client example and full Kerberos example for call hadoop commands directly in java code or on your local machine.
Language: Java - Size: 11.7 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 13 - Forks: 8

waltherg/distributable_docker_sql_on_hadoop
Toy Hadoop cluster combining various SQL-on-Hadoop variants
Language: Shell - Size: 88.9 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 11 - Forks: 4

Mahmoud-nfz/football-big-data
This is a comprehensive solution for real-time football analytics, leveraging Apache Spark execution on yarn for both streaming and batch processing, Hadoop HDFS for distributed storage, Kafka for real-time data ingestion, rethinkdb for live data updates , a custom built search engine and Next.js for data visualization.
Language: TypeScript - Size: 5.92 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 8 - Forks: 2

Areesha-Tahir/Hadoop-MapReduce-Sentiment-Analysis-Through-Keywords
A MapReduce program to conduct sentiment analysis of a keyword from a list of comments.
Language: Java - Size: 38.1 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 8 - Forks: 0

jodth07/hadoop-installation
Instructions on setting up Hadoop, HDFS, java, sbt, kafka, scala, spark and flume on Ubuntu 18.04
Language: Shell - Size: 61.5 KB - Last synced at: 9 months ago - Pushed at: almost 4 years ago - Stars: 8 - Forks: 15

mgarralda/hadoop-spark-cluster
Repository containing Docker images for create a cluster Spark on Hadoop Yarn.
Language: Dockerfile - Size: 286 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 7 - Forks: 3

leibniz21c/mammoth
Mammoth is a container based hadoop distributed system log analyzer. Sponsed by Mantech and Naver Cloud Platform.
Language: Dart - Size: 31.8 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 7 - Forks: 5

Ren294/Covid-Data-Process
This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.
Language: Shell - Size: 6.22 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 6 - Forks: 0

SepehrImanian/ansible-hadoop-hdfs
Ansible Playbook For Setup Hadoop HDFS
Language: Jinja - Size: 27.3 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 0

pfisterer/apache-hadoop-helm Fork of mgit-at/helm-hadoop-3
Helm chart for Apache Hadoop using multi-arch docker images
Language: Dockerfile - Size: 104 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 6 - Forks: 6

HxnDev/Hadoop-MapReduce-to-Analyze-Sentiment-of-Keyword
In this task, we had to write a MapReduce program to analyze the sentiment of a keyword from a list of comments. This was done using Hadoop HDFS.
Language: Java - Size: 1000 KB - Last synced at: 7 days ago - Pushed at: almost 4 years ago - Stars: 6 - Forks: 0

aadishgoel/Hadoop-Codes
Neat and Handy Place for all Hadoop codes
Language: Java - Size: 25.4 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 6 - Forks: 3

hadoop-sandbox/hadoop-sandbox-images
Docker image builds for Hadoop sandbox.
Language: Dockerfile - Size: 64.5 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 5 - Forks: 4

berksudan/Distributed-Environment-Installation-Guide
Install Hadoop, HDFS, Yarn and Spark on 3 Ubuntu 18.04 Machines
Size: 3.66 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 5 - Forks: 0

Ren294/Log-Analysis-Project
This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.
Language: Python - Size: 2.88 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 5 - Forks: 1

karamolegkos/Twitter_Data_Analyzer
Language: Java - Size: 388 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

LMAPcoder/Hadoop-on-Colab
Installation and configuration of Hadoop on Google Colaboratory
Language: Jupyter Notebook - Size: 620 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 5

HxnDev/Finding-Average-Temperature-of-Each-Year-using-Hadoop-HDFS
In this task, we had to calculate the average temperature for each year from the given dataset using Hadoop HDFS. We had to create a MapReduce function to perform this task.
Language: Java - Size: 451 KB - Last synced at: 7 days ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 0

prabal03/python-automation-in-linux
Python automation in linux
Language: Python - Size: 16.1 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 1

sihamhafsi/projet-big-data_analyse-des-donnees-youtube
Language: Java - Size: 5.21 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 0

prithvianilk/rdfs
An attempt to make a reliable, distributed file system inspired by HDFS
Language: Java - Size: 437 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 2

pjqdyd/Hadoop-demo
Hadoop hdfs mapreduce hive spark使用案例
Language: Java - Size: 52.7 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 1

waikeungt/hdfs-spring-boot-starter
用于spring boot快捷使用HDFS的starter
Language: Java - Size: 76.2 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 0

HxnDev/Hadoop-MapReduce-to-Find-Average-Length-of-Comments
In this task, we had to find the average length of comments given in the dataset. It was done using Hadoop MapReduce and Hadoop HDFS.
Language: Java - Size: 675 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 4 - Forks: 1

MarceloJSSantos/acelereracao-global-dev-4-everis-dio
Repositório criado para armazenar anotações e atividades desempenhadas no treinamento na plataforma da Digital Inovattion One (DIO) para o Processo seletivo de Engenheiros de Dados pela empresa Everis.
Language: Jupyter Notebook - Size: 57.3 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 4 - Forks: 2

federicopfund/data-engineer
Proceso ETL
Language: Jupyter Notebook - Size: 84.5 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

JohnLyonX/hadoop
Hadoop Configuration
Language: Shell - Size: 38.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

thedatasociety/lab-hadoop
Language: PLpgSQL - Size: 4.6 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 7

MarwanMashra/Hadoop-MapReduce
Map/Reduce project with Hadoop
Language: Python - Size: 1.11 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 0

SarahAyaz/YouTube_Data_Analysis
Analysis of YouTube Data using Hadoop Mapreduce framework in Java.
Language: Java - Size: 24.5 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 2

sloopstash/kickstart-hadoop
The ultimate aim of this Hadoop starter-kit Git repository is to help you deploy and manage Hadoop ecosystem components on AWS cloud using Docker, Kubernetes, and Chef.
Language: Ruby - Size: 150 KB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 7

Areesha-Tahir/Hadoop-MapReduce-To-Find-Average-Length-Of-Comments
A MapReduce program to calculate the average length of comments.
Language: Java - Size: 9.77 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 0

benjdiasaad/MapReduce_K-means
Implémentation de l'algorithme de clustering k-means en utilisant le framework Hadoop version 3.1.3 (MapReduce).
Language: Java - Size: 32.2 KB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 2

jomarsilio/Bootcamp-IGTI-Analista-de-Dados
Bootcamp ministrado pela IGTI com o objetivo de abordar de forma intensiva conceitos e práticas da análise de dados, habilitando o aluno para atuar profissionalmente na área.
Language: Jupyter Notebook - Size: 127 MB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 0

rishabmenon/YouTube-Data-Analysis-Hadoop
This Hadoop project involves analysing the YouTube dataset to solve a few problem statements.
Size: 1.75 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 3 - Forks: 5

viveknigam3003/hadoop-linux-setup
Python scripts to assist setting up Hadoop v1 in Linux and starting a NameNode, DataNodes and Client.
Language: Python - Size: 80.1 KB - Last synced at: 12 months ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 3

elaaatif/JPEG-and-JPEG2000-compression-on-Multi-node-cluster-using-hadoop-and-spark
Big Data technologies can be leveraged for efficient, distributed image compression using JPEG2000 (Spark) and JPEG (MapReduce).
Size: 14.3 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

jprakashkce/Olympic_Participants-Analysis
Analysis of Olympic Participants dataset using Hadoop Map Reduce.
Size: 27.3 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

29DCH/Hadoop-HDFS-MapReduce-Examples
Java API操作HDFS文件、基于MapReduce的词频统计程序及其重构、MapReduce编程之Combiner、Partitioner组件应用
Language: Java - Size: 35.2 KB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 1

rihemebh/bigdata-pipeline
a simple bigdata pipeline using hadoop, spark, kafka and hbase
Language: Java - Size: 174 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

Akilankm/Hadoop-Installation
The repo contains the steps for setting up the single node cluster in Hadoop 3.2.1 in Ubuntu 20.04 LTS
Size: 8.79 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

mikeroyal/Apache-Hadoop-Guide
Apache Hadoop Guide
Size: 141 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 2

sunnywalden/es-hadoop-data-share
ES Hadoop数据双向读写 share data between es and hadoop base on ES-Hadoop
Language: Java - Size: 118 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 2

joyeetadey/image-classification-using-distributed-SVM
This is a project in Large scale data processing to classify the damaged and non-damaged cars using distributed SVM in Pyspark and Hadoop
Language: Jupyter Notebook - Size: 34.4 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 1

AnkitaSinha98/SocialMedia-Analysis
Big Data is Stored and analyzed from various Social Media like Twitter, Facebook, Instagram, etc using Hadoop and Pig and all details are analyzed then result are given.This result is very useful for companies and for strategy planning and decision making.
Size: 7.05 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

kapilthakre/Bicycle-Sharing-Demand-Forecasting-Using-Spark-Scala
In this project, we are going to build a Bicycle sharing demand prediction service using Apache Spark and Scala. I have created a two spark application one for model generation and another for model demand prediction.
Language: Scala - Size: 295 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

benjdiasaad/MapRedcuce_Analyse_vente
Création d'un programme Hadoop Java : Analyse de ventes.
Language: Java - Size: 28.3 KB - Last synced at: 2 months ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

benjdiasaad/MapReduce_WordCount
Création d'un programme Hadoop Java : compteur d’occurrence de mots. Si vous souhaitez compiler manuellement le code sur la machine virtuelle Hadoop, vous devrez y copier ce code dans la VM
Language: Java - Size: 11.7 KB - Last synced at: 2 months ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

pranay1603/Linux-Automation
Linux-Automation
Language: Python - Size: 51.8 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

TrentBrunson/Big_Data
Apache Hadoop: HDFS, MapReduce, YARN, NLP, AWS, Spark, Google Colab, PySpark
Language: Jupyter Notebook - Size: 109 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

YiranJing/AWS-BigData-application
Projects using S3, Amazon SageMaker, AWS Lambda Function, Amazon Forecast; Projects related to SQL, Hadoop, Flink (Java), and Google Map API (Jun 2019 - Jul 2019)
Language: Jupyter Notebook - Size: 41 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 2

alex-ber/docker-hive Fork of ops-guru/docker-hive
EMR 5.25.0 cluster single node Hadoop docker image. With Amazon Linux, Hadoop 2.8.5 and Hive 2.3.5
Language: Shell - Size: 45.9 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

gurpreetsahni/Data-Fetching-using-Flume
In this project we will fetch tweets using Apache Flume. We will also use the memory channel to buffer these tweets and HDFS sink to push these tweets into the HDFS.
Size: 4.87 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

rishabmenon/Airlines-Analysis-Hadoop
This Hadoop project involves analysing the airline datasets to solve a few problem statements.
Size: 2.22 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 5

arkady-emelyanov/hadoop-playground 📦
🐘Yet another Hadoop playground
Language: Shell - Size: 49.8 KB - Last synced at: about 1 month ago - Pushed at: almost 7 years ago - Stars: 2 - Forks: 1

nbfujx/hadoop-learn-demo
Language: Python - Size: 32.2 KB - Last synced at: 3 months ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 2

vineet-k09/E-Book-Recommendation
E-Book Recommendation project based on hadoop and react with spark
Language: JavaScript - Size: 57.6 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

KeerthanaJ-rec/210701118-CS19P16-DA-Lab
Data Analytics Laboratory
Language: R - Size: 23.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

chouaib-629/MovieRecommendation
A Hadoop-based Movie Recommendation System using the MovieLens dataset, demonstrating MapReduce for sorting and processing movie ratings.
Language: Java - Size: 320 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

HabibAroua/Newspaper-analysis
Language: Java - Size: 12.5 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 1

divithraju/divith-raju-pipeline-hadoop-pyspark
This project presents a comprehensive data pipeline designed to predict customer churn using historical customer data. By leveraging Hadoop and PySpark, this pipeline efficiently processes large datasets, performs feature engineering, and trains a machine learning model to identify customers at risk of leaving.
Language: Python - Size: 4.88 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

IliesChibane/Projet-IoT-Cloud-BigData
Implémentation d'une pipeline permettant de faire la prédiction de la maladie de parkinson via des outils d'IoT, Cloud, et Big Data
Language: Python - Size: 891 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Lakshmiec/Big-Data-Sentiment-Analysis-of-Amazon-Reviews-for-Seller-and-Brand-Empowerment
Language: Jupyter Notebook - Size: 1.53 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

abroniewski/IdleCompute-Data-Management-Architecture
Implementation of a big data management and analysis backbone architecture using PySpark for distributed and scalable data ingestion and MLlib for machine learning analysis. Part of Big Data Management and Analytics (BDMA) program.
Language: Jupyter Notebook - Size: 34.8 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

aymane-maghouti/Mobile-Data-Hive-Insights
This project demonstrates the process of extracting data from a MySQL database, transferring it using Apache Sqoop, storing it in Hive Data warehouse (the data actually is store in Hadoop Distributed File System (HDFS)), and performing analysis using Hive Query Language (Hive QL) (it is a language close to SQL). Then visualize the data in Power BI,
Language: HiveQL - Size: 691 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Luissalazarsalinas/Avocado-Yield-Prediction
Freelancer Project - Batch processing data pipeline and machine learning application.
Language: Python - Size: 3.53 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

ddrEricNo1/ds_project
This is my distributed system final project.
Language: Java - Size: 345 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

aquib-sh/setup-hadoop
A BASH script to setup Apache Hadoop and Apache Hive with Derby database on Debian GNU/Linux
Language: Shell - Size: 37.1 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Karthik-SK12345/6THSEM-BDA-1BM19CS070-KARTHIK.S
This repository will be used to upload all my files and output documents that is during the course of "Big Data Analytics".
Size: 20.4 MB - Last synced at: 6 months ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

fbraza/scala-dfs-lib
DFS-Lib is a scala flavoured api to the Hadoop java filesystem api
Language: Scala - Size: 75.2 KB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

ManikHossain08/Bixi-Cloud-ETL-Data-Pipeline-using-Scala-Hive-AWS_Athena_JDBC-Driver
An Automated ETL Data pipeline which extract complex json data from web API service (GBFS-bixi Data) and convert to CSV for loading into Data-warehouse HDFS. After-that, Hive will process the further by external and managed table. Same procedure is also applied with AWS S3 and Athena.
Language: Scala - Size: 117 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

mikeroyal/Apache-Pig-Guide
Apache Pig Guide
Size: 444 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

sxd-big-data/bigdata
This project contains springboot,mysql,mybatis,kettle,hadoop,hive,spark.
Language: Java - Size: 106 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

everthonnreis/hadoop-spark-install-shell-script
Script for installing a standalone hadoop and spark environment
Language: Python - Size: 87.9 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

BurraAbhishek/Python_Hadoop_MapReduce_MarketBasketAnalysis
Market Basket Analysis using Hadoop MapReduce in Python
Language: Python - Size: 103 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 2

KingJin-web/Hadoop
hadoop-hdfs 以及 mapreduce 学习
Language: Java - Size: 7.56 MB - Last synced at: 11 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 1

Zaaim-Halim/Hadoop-Mapreduce
hadoop mapreduce problems using hadoop version 3.3.0
Language: JavaScript - Size: 22.9 MB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

raktim00/Hadoop-HDFS-MR-Multi-Node-Cluster-AWS-Ansible
Provisioning EC2 Instances & then setting up Hadoop Multi Node Storage (HDFS) & Compute (MR) Cluster on them using Ansible Automation
Language: Jinja - Size: 22.5 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 2

humanbeeng/hadoop-auto-install
A small helper script that can save your valuable time during installation of Apache Hadoop.
Language: Shell - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

GarlGuo/LibraryDatacenter
This is my group project for creating a distributed library datacenter on MongoDB, Redis, and Hadoop Distributed File System for Distributed Database Systems course on my study-away program in Tsinghua University
Language: Python - Size: 4.87 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

nermiin/ApacheHadoop-BigData-AmazonCustomerReviewsSystem
Hadoop, mapreduce, big vs teknolojiler ile big data uygulaması geliştirilmiştir.
Language: Java - Size: 120 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

Trisha11r/covid_data_analysis_mapreduce
COVID-19 data analysis with MapReduce
Language: Java - Size: 8.79 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

Sahith-8055/Big_Data_Technologies
Language: Java - Size: 35.7 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

anshul1004/MutualFriends
Implementation of Hadoop and Spark
Language: Java - Size: 23 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

lucasmior/hadoop-vm
Virtual Machine with Hadoop environment setup and ready to run map-reduce applications
Language: Shell - Size: 4.88 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 1
