An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: hadoop-hdfs

seaweedfs/seaweedfs

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

Language: Go - Size: 69.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 24,619 - Forks: 2,406

Diakkoo/docker-hadoop-container

用dockerfile编写的flask容器和hdfs容器,更快捷地部署hdfs集群

Language: Python - Size: 8.79 KB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

OBenner/data-engineering-interview-questions

More than 2000+ Data engineer interview questions.

Size: 938 KB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 1,341 - Forks: 477

Asterinos1/Movie-Preference-Analyzer

INF424 Project 2025: Movie Preference Analyzer. Big Data analytics tool using Apache Spark, developed in Scala. adhering to functional programming rules.

Language: Scala - Size: 4.04 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

vineet-k09/E-Book-Recommendation

E-Book Recommendation project based on hadoop and react with spark

Language: JavaScript - Size: 63.5 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 1 - Forks: 0

mgarralda/hadoop-spark-cluster

Repository containing Docker images for create a cluster Spark on Hadoop Yarn.

Language: Jupyter Notebook - Size: 161 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 7 - Forks: 3

MarouaHattab/Big-Data-Price-Prediction

A Big Data project that implements a distributed machine learning pipeline for predicting property prices in Tunisia. The solution adapts a high-performing XGBoost model to a Spark environment using Gradient Boosted Trees, achieving 91.5% of the original model's accuracy (R² of 0.6918) while enabling large-scale processing.

Language: Python - Size: 1.3 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

groda/big_data

Tutorials on Big Data essentials: Hadoop, MapReduce, Spark. Explore a variety of tutorials and demonstrations on Big Data technologies, primarily in the form of Jupyter notebooks. Most notebooks are self-contained and live—ready to run with a click.

Language: Jupyter Notebook - Size: 51.9 MB - Last synced at: 11 days ago - Pushed at: about 1 month ago - Stars: 75 - Forks: 26

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka

Data Engineering Project with Hadoop HDFS and Kafka

Language: Python - Size: 3.46 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 102 - Forks: 25

Houssam-11/BigData-Architecture

Big Data system predicts pandemic risk (COVID-19) via data analysis, ML modeling, and real-time dashboard.

Language: Python - Size: 29 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

TechAlhan826/Hadoop-Tasks

Hadoop MapReduce Tasks Java - Big Data Project 🚀

Language: Java - Size: 275 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

KeerthanaJ-rec/210701118-CS19P16-DA-Lab

Data Analytics Laboratory

Language: R - Size: 23.1 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

Mahmoud-nfz/football-big-data

This is a comprehensive solution for real-time football analytics, leveraging Apache Spark execution on yarn for both streaming and batch processing, Hadoop HDFS for distributed storage, Kafka for real-time data ingestion, rethinkdb for live data updates , a custom built search engine and Next.js for data visualization.

Language: TypeScript - Size: 5.92 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 2

vaxdata22/NoSQL-and-Big-Data-demonstration

This is a fun assignment task I undertook to explore the world of NoSQL and Big Data. technologies.

Size: 8.6 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

hadoop-sandbox/hadoop-sandbox-images

Docker image builds for Hadoop sandbox.

Language: Dockerfile - Size: 64.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 5 - Forks: 4

kriss024/Hadoop

Hadoop and Hive fundamental commands

Language: Shell - Size: 451 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

IliesChibane/Projet-IoT-Cloud-BigData

Implémentation d'une pipeline permettant de faire la prédiction de la maladie de parkinson via des outils d'IoT, Cloud, et Big Data

Language: Python - Size: 891 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

elaaatif/JPEG-and-JPEG2000-compression-on-Multi-node-cluster-using-hadoop-and-spark

Big Data technologies can be leveraged for efficient, distributed image compression using JPEG2000 (Spark) and JPEG (MapReduce).

Size: 14.3 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

groda/hats

Hadoop Ansible Test Suite

Language: Shell - Size: 33.2 KB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

berksudan/Distributed-Environment-Installation-Guide

Install Hadoop, HDFS, Yarn and Spark on 3 Ubuntu 18.04 Machines

Size: 3.66 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 5 - Forks: 0

madhurimarawat/Big-Data-Analytics

This repository demonstrates big data processing, visualization, and machine learning using tools such as Hadoop, Spark, Kafka, and Python.

Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 1

arkady-emelyanov/hadoop-playground 📦

🐘Yet another Hadoop playground

Language: Shell - Size: 49.8 KB - Last synced at: about 8 hours ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 1

chouaib-629/MovieRecommendation

A Hadoop-based Movie Recommendation System using the MovieLens dataset, demonstrating MapReduce for sorting and processing movie ratings.

Language: Java - Size: 320 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

benjdiasaad/MapReduce_WordCount

Création d'un programme Hadoop Java : compteur d’occurrence de mots. Si vous souhaitez compiler manuellement le code sur la machine virtuelle Hadoop, vous devrez y copier ce code dans la VM

Language: Java - Size: 11.7 KB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

Morphl-AI/MorphL-Community-Edition

MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization

Language: Python - Size: 143 KB - Last synced at: 2 days ago - Pushed at: over 5 years ago - Stars: 261 - Forks: 29

mehwishferoz/BDA-project

A Hadoop MapReduce project analyzing the Consumer Complaints dataset with five queries to extract insights like complaints by product, state, company, tags, and timely responses.

Language: Java - Size: 7.42 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

hadoop-sandbox/hadoop-sandbox

A fully-functional Hadoop Yarn cluster as docker-compose deployment.

Language: Shell - Size: 103 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 16 - Forks: 5

Ren294/Covid-Data-Process

This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.

Language: Shell - Size: 6.22 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 6 - Forks: 0

Ibrahim-Maiga/Big-Data-Analysis-with-PySpark

A comprehensive big data analysis examining correlations between temperature changes and societal metrics (crime rates, birth rates, and energy consumption) across the US and Canada. The project leverages multiple database systems and cloud computing to process and analyze large-scale climate and social data.

Language: Jupyter Notebook - Size: 454 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

DevLucho/Spark-Procesamiento-en-batch

Este proyecto utiliza PySpark para analizar datos de estudiantes a partir de un archivo CSV almacenado en HDFS.

Language: Python - Size: 93.8 KB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

HabibAroua/Newspaper-analysis

Language: Java - Size: 12.5 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 1

ANKIT21111/Patient-Alert-ETL

The Patient Alert ETL 🚑 project creates a real-time data pipeline to monitor vital health parameters from IoT devices in hospitals. Using Apache Kafka, Spark, and HBase, it processes streaming data and sends immediate alerts via Amazon SNS when vitals exceed normal thresholds, enhancing patient care through timely interventions.

Language: Python - Size: 5.47 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

Ren294/Log-Analysis-Project

This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.

Language: Python - Size: 2.88 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 5 - Forks: 1

IBM/sparksql-for-hbase

Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers

Size: 614 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 69 - Forks: 27

divithraju/divith-raju-pipeline-hadoop-pyspark

This project presents a comprehensive data pipeline designed to predict customer churn using historical customer data. By leveraging Hadoop and PySpark, this pipeline efficiently processes large datasets, performs feature engineering, and trains a machine learning model to identify customers at risk of leaving.

Language: Python - Size: 4.88 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

aymane-maghouti/Mobile-Data-Hive-Insights

This project demonstrates the process of extracting data from a MySQL database, transferring it using Apache Sqoop, storing it in Hive Data warehouse (the data actually is store in Hadoop Distributed File System (HDFS)), and performing analysis using Hive Query Language (Hive QL) (it is a language close to SQL). Then visualize the data in Power BI,

Language: HiveQL - Size: 691 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

prateekkr1/Project-Work

This repository contains some of my personal projects.

Language: Jupyter Notebook - Size: 3.79 MB - Last synced at: 10 months ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

jodth07/hadoop-installation

Instructions on setting up Hadoop, HDFS, java, sbt, kafka, scala, spark and flume on Ubuntu 18.04

Language: Shell - Size: 61.5 KB - Last synced at: 10 months ago - Pushed at: almost 4 years ago - Stars: 8 - Forks: 15

INeerav/sparkini

base docker compose to setup the data engineering env in local

Language: Jupyter Notebook - Size: 34.2 KB - Last synced at: 3 days ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

KingJin-web/Hadoop

hadoop-hdfs 以及 mapreduce 学习

Language: Java - Size: 7.56 MB - Last synced at: 12 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 1

xpcosmos/data-lake-prime

This project aims to simulate and configure a Distributed File System using Hadoop HDFS. For this project, 3 machines were created: 1 Master Node and 2 Worker Nodes.

Language: Shell - Size: 815 KB - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

29DCH/Hadoop-HDFS-MapReduce-Examples

Java API操作HDFS文件、基于MapReduce的词频统计程序及其重构、MapReduce编程之Combiner、Partitioner组件应用

Language: Java - Size: 35.2 KB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 1

viveknigam3003/hadoop-linux-setup

Python scripts to assist setting up Hadoop v1 in Linux and starting a NameNode, DataNodes and Client.

Language: Python - Size: 80.1 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 3

StefanoFioravanzo/evolving-wikipedia-graph

Distributed processing of Wikipedia history files using Hadoop and Spark

Language: Scala - Size: 3.57 MB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

dotM87/triaina

big data project, information storage in hdfs

Size: 2.93 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ANKIT21111/SparNordETL

ETL Pipeline for Spar Nord Bank for the analysis of refilling frequency of the ATM's all over the europe

Language: Jupyter Notebook - Size: 4.59 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

subhash26jan96/cluster

This repository has a hadoop cluster code that are automated, ondemand, manual using by python, linux, html etc.

Language: Python - Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 1

varunajmera0/PyHDFS

PyHDFS: Scalable & resilient distributed file system. Components: Zookeeper, NameNode, DataNode, Metadata service, Client. Setup guide for AWS & local. Explore distributed storage!

Language: Python - Size: 630 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ra312/spark_with_scala

A collection of useful scala scripts to work with Hadoop

Language: Scala - Size: 1000 Bytes - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

ais04134/hyperv-hadoop-spark-cluster

Hadoop Ecosystem - 대규모 빈발 패턴 마이닝을 위한 하둡 클러스터 환경 구축

Language: Shell - Size: 2.35 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

emrectn/HadoopTutorial

hadoop

Language: Java - Size: 15.6 KB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

tableMinPark/trendflow

❗ 트랜드 분석 플랫폼 - SSAFY 8기 특화 프로젝트

Language: Java - Size: 55.6 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

prabal03/python-automation-in-linux

Python automation in linux

Language: Python - Size: 16.1 MB - Last synced at: 2 days ago - Pushed at: over 4 years ago - Stars: 6 - Forks: 2

federicopfund/data-engineer

Proceso ETL

Language: Jupyter Notebook - Size: 84.5 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

vim89/datapipelines-essentials-python

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Language: Python - Size: 1.76 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 53 - Forks: 34

aadishgoel/Hadoop-Codes

Neat and Handy Place for all Hadoop codes

Language: Java - Size: 25.4 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 6 - Forks: 3

linkedin/dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.

Language: Java - Size: 297 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 129 - Forks: 36

nzsaurabh/hadoop_training

Exercises on MapReduce, Pig, Spark, Relational and Non Relational data stores in Hadoop

Size: 939 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

Driramohamedfarouk/bigdata-stock-market-pipeline

Language: Scala - Size: 310 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 1

venkat-a/Exploratory-Data-Analysis-EDA-using-PySpark

Leverage the power of Apache Spark for large-scale data processing and analysis

Language: Jupyter Notebook - Size: 147 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

41xu/Hadoop-ClassNotes

Some code during learning Hadoop.

Language: Java - Size: 6.1 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

oahjimenez/x3ia020_pagerank

PageRank - Pig vs PySpark comparison https://madoc.univ-nantes.fr/mod/assign/view.php?id=1511791

Language: Python - Size: 555 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

jarlor/TravelWebsite_BigDataAnalysis

旅游网站(携程网部分数据)大数据分析-hadoop课程设计(本科课设级别)

Language: Java - Size: 639 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 1

Luissalazarsalinas/Avocado-Yield-Prediction

Freelancer Project - Batch processing data pipeline and machine learning application.

Language: Python - Size: 3.53 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

seyfal/MapReduceGraphComparison

Distributed computational problem-solving project, which aims to perform large-scale graph matching using cloud computing technologies. The project allows users to import two directed graphs and analyze the differences between them.

Language: Scala - Size: 1.76 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

JohnLyonX/hadoop

Hadoop Configuration

Language: Shell - Size: 38.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

JulienSimons/Hadoop-map-reduce

This is a word count application using the implementation of map and reduce functions. I tested this on Amazon Web Services (AWS) through a double instance cluster network. This network used a functional master (NameNode) and slave (DataNode) nodes.

Language: Java - Size: 11.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Lakshmiec/Big-Data-Sentiment-Analysis-of-Amazon-Reviews-for-Seller-and-Brand-Empowerment

Language: Jupyter Notebook - Size: 1.53 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

drewm8080/big_data_management

Contains all homework from the course Foundations of Database Management at USC

Language: Python - Size: 4.75 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

claireboyd/311requests_chicago

Created a simple web app which gives users a summary of the types of 311 requests in their Chicago neighborhood, built with Lambda Architecture principles using Apache's tech stack

Language: HiveQL - Size: 27.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

HxnDev/Hadoop-MapReduce-to-Analyze-Sentiment-of-Keyword

In this task, we had to write a MapReduce program to analyze the sentiment of a keyword from a list of comments. This was done using Hadoop HDFS.

Language: Java - Size: 1000 KB - Last synced at: 30 days ago - Pushed at: almost 4 years ago - Stars: 6 - Forks: 0

mikeroyal/Apache-Hadoop-Guide

Apache Hadoop Guide

Size: 141 KB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 2

marcocolangelo/Big-Data-processing-and-Analytics

The current repository contains all the code developed during the Big Data processing and Analytics laboratories. Data are processed and analyzed using Hadoop and Spark

Language: Java - Size: 6.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

anshul1004/MutualFriends

Implementation of Hadoop and Spark

Language: Java - Size: 23 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

MarceloJSSantos/acelereracao-global-dev-4-everis-dio

Repositório criado para armazenar anotações e atividades desempenhadas no treinamento na plataforma da Digital Inovattion One (DIO) para o Processo seletivo de Engenheiros de Dados pela empresa Everis.

Language: Jupyter Notebook - Size: 57.3 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 4 - Forks: 2

seyfal/SparkMitMAttackSim

Scalable simulation of MitM attacks using parallel random walks and graph analytics on Spark.

Language: Scala - Size: 76.2 KB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

abroniewski/IdleCompute-Data-Management-Architecture

Implementation of a big data management and analysis backbone architecture using PySpark for distributed and scalable data ingestion and MLlib for machine learning analysis. Part of Big Data Management and Analytics (BDMA) program.

Language: Jupyter Notebook - Size: 34.8 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

benjdiasaad/MapReduce_K-means

Implémentation de l'algorithme de clustering k-means en utilisant le framework Hadoop version 3.1.3 (MapReduce).

Language: Java - Size: 32.2 KB - Last synced at: 23 days ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 2

lauravoicu/Coursera-Hadoop-Platform-Application

Language: Python - Size: 6.84 KB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

MatteoM95/Big-data-processing-and-analytics

Exercises on Spark and Hadoop - Done in Distributed architectures for big data processing and analytics course at Politecnico di Torino

Language: Java - Size: 4.94 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 2

HaneefAhamed/Hadoop_Map_Reduce

Hadoop setup and Getting Started with developing Hadoop programs

Size: 11.7 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

ayush-usf/stack-overflow-logs-hadoop-analysis

Ask Ubuntu Logs analysis with Hadoop, MapReduce 2(Yarn)

Language: Java - Size: 108 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

SubalakshmiShanthosi/PCP1211DALab

Language: TeX - Size: 34.4 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

Slimani-CE/big-data-tp1

Un TP qui vise à familiariser les apprenants avec le système de fichiers distribué Hadoop (HDFS). Les objectifs spécifiques comprennent le démarrage des processus Hadoop, la création d'une structure d'arborescence dans le HDFS, la manipulation de fichiers en utilisant des commandes Hadoop.

Size: 495 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

asaldelkhosh-learning/hadoop

Learning Hadoop and Map-Reduce!

Size: 33.2 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Slimani-CE/hadoop-crud-api

Une API en Java pour interagir avec le Hadoop Distributed File System (HDFS). Cette API offre des fonctionnalités pour la lecture et l'écriture de données dans le HDFS

Language: Java - Size: 28.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

LBF38/big_data

Big Data @ ENSTA Bretagne

Language: Java - Size: 358 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

chseeling/rpi_cluster

Hadoop, Spark, MPI

Language: Jupyter Notebook - Size: 3.55 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

prithvianilk/rdfs

An attempt to make a reliable, distributed file system inspired by HDFS

Language: Java - Size: 437 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 2

kulwinderkk/Big_data_Wrangling_GoogleNgram_data_analysis

Loaded, filtered and visualized Google Ngrams dataset, which was created by Google's research team by analyzing all of the content in Google Books from the 1800s into the 2000s, in a cloud-based distributed computing environment using Hadoop, Spark, and the AWS S3 file system.

Language: Jupyter Notebook - Size: 480 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

AH-Yussef/Health-Monitor-Big-Data-System

A Health Monitor to simulate receiving and processing large amounts of health metrics from many clients with the goal of efficiently finding aggregate statistics

Language: Java - Size: 319 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

DanMolenhouse/Distributed-Systems-Project5-Hadoop-and-Spark

In this project, we used both Hadoop / MapReduce and Spark to do distributed computing. The first task was to perform a series of operations using a Mapper and Reduce java file that was implemented on a Hadoop server. The second task was to perform similar operations, but on Spark instead.

Language: Java - Size: 70.3 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Grg0rry/MapReduce-Recommendation-System

A recommendation system built on top of Hadoop Distributed File System and MapReduce

Language: Java - Size: 204 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

kinszee/MySQL-Hive-PowerBI-Pipeline

Built a data pipeline by creating tables in MySQL DB, ingested tables to Hadoop for data warehousing and built HiveQL views. Hive views in Linux VM were connected to Power BI application in Windows to create visualizations.

Size: 2.17 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Vzzarr/BigData---FineFoodReviews

Language: JavaScript - Size: 2.57 MB - Last synced at: almost 2 years ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0

somisettyv/HadoopWordCount

Hadoop MapReduce Word Count

Language: Java - Size: 5.86 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

RonnJacob/PageRank-MapReduce-Spark

Implemented the PageRank algorithm in Hadoop MapReduce framework and Spark.

Language: Java - Size: 442 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 1

mehroosali/ABCStoresPipeline

Batch ETL data pipeline built on HDP 3.0 to process daily sales and business data to procedure power Bi reports. Automated the pipelines using Airflow.

Language: Scala - Size: 464 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

islamyounis/Taxi-Network-Analysis-with-ETL-Real-Time-Processing-Data-Analysis

The project aims to analyze real-time and historical data from the Taxi Network in New York City, utilizing data warehousing, ETL, batch processing, real-time processing, data analysis with insights presented through a Real-Time Dashboard on NoSQL Database and an Aggregated Dashboard for the changes that happened in our business overtime.

Size: 19.5 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

akshaytambe/Big-Data-Scripts

Python Scripts for working with Big Data Files

Language: Python - Size: 193 KB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 1