GitHub topics: apache-hadoop-framework
divithraju/divith-aju-Hadoop-Pyspark-pipeline
This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.
Language: Python - Size: 4.88 KB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

DuarteDomingues/Large-Scale-Data-Computation-Word-Count-project
Large-scale data computing word count project
Language: Java - Size: 474 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

bayudwiyansatria/library-java-apache-hadoop
Apache Hadoop. Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Originally designed for computer clusters built from commodity hardware—still the common use—it has also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.
Language: Java - Size: 352 KB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 1

13caroline/imdb-datasets
Managing large data sets projects (Data Science)
Language: Java - Size: 1.02 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0
