GitHub topics: apache-hadoop

Repositories

PasanAbeysekara/Taxi-Pickup-Hotspot-Analysis-using-Hadoop-MapReduce

This project analyzes one month of NYC Yellow Taxi trip data (January 2016) to identify the busiest taxi pickup locations. It utilizes the Hadoop MapReduce framework to process the data and a lookup table to map location IDs to human-readable zone names.

Language: Java - Size: 5.65 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Malisha4065/HadoopConfiguration

Apache Hadoop Cluster configuration with original apache/hadoop:3.4.1 docker image (with YARN)

Language: Shell - Size: 6.84 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3 - Forks: 0

Malisha4065/HadoopProject

Map reducing task with apache hadoop.

Language: Java - Size: 17.6 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

tencentyun/hadoop-cos

hadoop-cos（CosN文件系统）为Apache Hadoop、Spark以及Tez等大数据计算框架集成提供支持，可以像访问HDFS一样读写存储在腾讯云COS上的数据。同时也支持作为Druid等查询与分析引擎的Deep Storage

Language: Java - Size: 105 MB - Last synced at: 7 days ago - Pushed at: 14 days ago - Stars: 85 - Forks: 52

mahmoudparsian/data-algorithms-book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Language: Java - Size: 397 MB - Last synced at: 18 days ago - Pushed at: 8 months ago - Stars: 1,075 - Forks: 661

mahmoudparsian/big-data-mapreduce-course

Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University

Language: HTML - Size: 601 MB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 158 - Forks: 143

taabishhh/LLM_Preprocessing

This project implements a Byte Pair Encoding (BPE) tokenization approach along with a Word2Vec model to generate word embeddings from a text corpus. The implementation leverages Apache Hadoop for distributed processing and includes evaluation metrics for optimal dimensionality of embeddings.

Language: Scala - Size: 7.37 MB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

L00kAhead/hadoop_cluster

HDFS Cluster Setup with Docker. A simple Docker setup for a Hadoop HDFS cluster with one NameNode and two DataNodes for testing and learning purposes.

Size: 3.91 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

PBWebMedia/yarn-prometheus-exporter

Export Hadoop YARN (resource-manager) metrics in prometheus format

Language: Go - Size: 81.1 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 51 - Forks: 20

sawallesalfo/Big-Data-Technologies

Big Data Technologies can be defined as software tools for analyzing, processing, and extracting data from an extremely complex and large data set with which traditional management tools can never deal

Language: Python - Size: 906 KB - Last synced at: 27 days ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 0

miguelTavora/Word-Count 📦

Problem of word count done using Apache Hadoop

Language: Java - Size: 715 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

yingzhuo/logback-flume-appender

logback appender for apache-flume

Language: Java - Size: 37.1 KB - Last synced at: 3 days ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

tashi-2004/Apache-Hadoop-Spark-Hive-CyberAnalytics

This project utilizes Apache Hadoop, Hive, and PySpark to process and analyze the UNSW-NB15 dataset, enabling advanced query analysis, machine learning modeling, and visualization. The project demonstrates efficient data ingestion, processing, and predictive analytics for network security insights.

Language: Jupyter Notebook - Size: 2.62 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Narius2030/Hive-DataWarehouse-Analysis

Implement a Hive data warehouse to store meaningful data, apply Machine Learning like Clustering or Regression for dealing with business problems

Language: Jupyter Notebook - Size: 24.9 MB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 3

berksudan/Analysis-on-Big-Data-with-Hadoop

Implementation of Statistical Methods via Hadoop Map-Reduce Library.

Language: Java - Size: 75.2 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

yuhexiong/deploy-hadoop-guide

Size: 61.5 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

yycorcino/distributed-system-for-movie-recommendations Fork of KathiraveluLab/Dragonfly

Apache Spark with Apache Hadoop for Machine Learning Application

Language: Python - Size: 969 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 1

mark-deligiannis/DB-lab-SPARK-project

Term project for NTUA course "Advanced Topics in Database Systems". Big Data analysis is performed on the "Los Angeles Crime Data" dataset.

Language: Python - Size: 331 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

myndaaa/BigDataArchitecture-COS20028-Swinburne

Apache Hadoop – A course for undergraduates | along with Apache Pig and Hive

Language: Java - Size: 2.4 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

aaqib-ahmed-nazir/Naive_Search_Engine

This repository aims to develop a basic search engine utilizing Hadoop's MapReduce framework to index and process extensive text corpora efficiently. The dataset used for this project is a subset of the English Wikipedia dump, totaling 5.2 GB in size. The project focuses on implementing a naive search algorithm to address challenges in information.

Language: Jupyter Notebook - Size: 120 KB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

s911415/apache-hadoop-3.1.0-winutils

HADOOP 3.1.0 winutils

Language: Batchfile - Size: 985 KB - Last synced at: 7 months ago - Pushed at: about 7 years ago - Stars: 73 - Forks: 104

mituskillologies/bigdata-ait-sep24

Programs conducted at Army Institute of Technology, Pune in training on Big Data Analytics during September 2024.

Language: Java - Size: 10.7 KB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

chriskery/hadoop-operator

Kubernetes operator for managing the lifecycle of Apache Hadoop Yarn Tasks on Kubernetes.

Language: Go - Size: 3.06 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 1

BhushanSagar/Car-Insurance-Cold-Calls-Data-Analysis

Car Insurance Cold Calls Data Analysis using Apache Hive

Language: HiveQL - Size: 1.17 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Abdelhakim-gh/BigData_Project

This project aims to establish a data streaming pipeline with storage, processing, and visualization

Language: Python - Size: 28.7 MB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

nghoanglongde/spark-cluster-with-docker

The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker

Language: Shell - Size: 43 KB - Last synced at: 9 months ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 2

Guru107/hadoop-small-files-merger

A Spark application to merge small files on Hadoop

Language: Scala - Size: 101 KB - Last synced at: about 2 months ago - Pushed at: almost 5 years ago - Stars: 8 - Forks: 3

shawnzhu/docker-hive-1 Fork of IBM/docker-hive

Docker image for Hive Metastore

Language: Dockerfile - Size: 19.5 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

Coursal/Text-Sentiment-Analysis-In-Hadoop-And-Spark

The source code developed and used for the purposes of my thesis with the same title under the guidance of my supervisor professor Vasilis Mamalis for the Department of Informatics and Computer Engineering of the University of West Attica.

Language: Java - Size: 66.5 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 1

Coursal/Hadoop-Letter-File-Index-Counter

A Hadoop-based Java project that counts the max number of word occurences for each letter in a textfile of a folder.

Language: Java - Size: 213 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

Coursal/Hadoop-Examples

Some simple, kinda introductory projects based on Apache Hadoop to be used as guides in order to make the MapReduce model look less weird or boring.

Language: Java - Size: 340 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 2

jagdish4501/Network-intrusion-Detection

This repository provides a guide to preprocess and analyze the network intrusion data set using NumPy, Pandas, and matplotlib, and implement a random forest classifier machine learning model using Scikit-learn.

Language: Jupyter Notebook - Size: 28.7 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

lepetitprinz/apache-hadoop

Hands-on learning Hadoop

Language: Java - Size: 1.16 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

keramiozsoy/apache-spark-yarn-mode-aws-101

An example of installation Apache Spark on AWS

Size: 146 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

carlosemsantana/docker-hadoop

Preparação de um ambiente de desenvolvimento e testes para Apache Hadoop.

Size: 4.86 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

whoami-anoint/EasyHadoop

Simplified Hadoop Setup and Configuration Automation

Language: Shell - Size: 12 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

felidsche/movie-recommender

A movie recommendation system built using Apache Spark’s ML library

Language: Python - Size: 829 KB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

felidsche/mail-spam-filter

An email spam filter using Apache Spark’s ML library

Language: Python - Size: 212 KB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 1

dmarks84/Coursework_Capstone_Full_Data_Engineering

Final Project for IBM Data Engineering & Python Professional Certificate -- Applied all skills and methods utilized in the series of courses for this certification

Language: Jupyter Notebook - Size: 4.25 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

SomeshChevella/Apache-Hadoop-Map-Reduce--Basic-Sentiment-Analysis-on-Yelp-Dataset

In this project we will use Hadoop MapReduce to implement a very basic “Sentiment Analysis” using the review text in the Yelp Academic Dataset as training data.

Language: Java - Size: 7.39 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

VikentiosVitalis/advanced_topics_in_database_systems

Data Science Project - for 'Advanced Topics in Database Systems' M.Sc. Course ECE @ntua

Language: Python - Size: 10.6 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

esakik/data-engineering-essentials

Samples related to data engineering, e.g. spark, embulk, airflow, etc.

Language: Python - Size: 413 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

mohammadtavakoli78/Cloud-Computing

This is projects of Cloud Computing Course

Language: Python - Size: 9.1 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 1

gangodu/cloud

AWS Cloudera Hadoop setup with H2O, Spark, MR

Language: Java - Size: 49.1 MB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0

on2e/ntua-atdb

Advanced Topics in Databases course project - NTUA ECE - 2022-23

Language: Python - Size: 24.4 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

probaldhar/AprioriMapReduce

Java code for Apriori algorithm using MapReduce

Language: Java - Size: 2.68 MB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

heracliteanflux/exercises-scala

Exercises in the Scala programming language with an emphasis on big data programming and applications in Apache Hadoop and Apache Spark.

Language: Java - Size: 3.29 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

hridayns/Big-Data-Apache-server-logs-analysis-using-Pig-and-Python

Big Data – Apache server logs analysis using Pig and Python

Language: Python - Size: 4.88 KB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

rahulinux/ansible-hadoop

Apache Hadoop multi-node setup using ansible

Size: 10.7 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

Lucass97/FlightAnalysis

This project implemented a lambda architecture for analyzing domestic flight data in the US from 2009 to 2020. It used Apache Spark for batch processing, Spark Streaming for real-time analysis, and SVM models to predict flight cancellations and delays, with Docker for cluster management and Grafana for real-time visualization.

Language: Jupyter Notebook - Size: 5.66 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 1

surbhitawasthi/MiniProject-AadharCensusDataValidation

A small code to validate the Census data on the basis of Aadhar Data

Language: Java - Size: 6.14 MB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 0

kowaalczyk/spark-minimal-algorithms

An python implementation of Minimal Mapreduce Algorithms for Apache Spark

Language: Python - Size: 52.7 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 0

Jordan396/Giraph-1.2.0-Installation 📦

Instructions for Installing Giraph-1.2.0

Size: 118 KB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 0

aquib-sh/setup-hadoop

A BASH script to setup Apache Hadoop and Apache Hive with Derby database on Debian GNU/Linux

Language: Shell - Size: 37.1 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Bayunova28/Spotify_Lyrics

This repository contains my personal project to generate mapreduce using apache hadoop

Language: Shell - Size: 19.7 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

luckyp71/hadoop-hbase-phoenix-zookeeper-integration

Hadoop, HBase, Phoenix, and Zookeeper Integration

Language: Shell - Size: 30.3 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

Umer86/Dice-Big-Data-Certification

This repository contains all the material related to this big data certification.

Size: 13.6 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

unobatbayar/big-data-processing

Learning Apache Hadoop for Big Data. Moreover, exploring Map Reduce, Apache Spark RDD, Distributed Processing and Stream Processing

Language: Python - Size: 3.9 MB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

bdoepf/aws-emr-prometheus

Language: HCL - Size: 38.1 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 1

LorenzoGianassi/Twitter_Sentiment_Analysis_Lambda_Architecture

Full term Project of the exam of Parallel Computing of University of Florence. Implementation of Twitter Sentiment Analysis using Hadoop, Apache Storm and HBase to obtain parallelization.

Language: Java - Size: 661 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

mohammadzainabbas/BDM

Big Data Management ✨

Language: Jupyter Notebook - Size: 782 KB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

realtimedatalake/hive-metastore-docker

Containerized Apache Hive Metastore for horizontally scalable Hive Metastore deployments

Language: Dockerfile - Size: 19.5 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 6

krishd46/AverageSalary-Hadoop-MapReduce

🗄️ Finding the average salary in Hadoop HDFS using MapReduce.

Language: Java - Size: 197 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

FayStatha/atds-project-NTUA-2021

A project for Advanced Topics in Database Systems course of ECE, NTUA for fall semester of academic year 2020-2021.

Language: Python - Size: 635 KB - Last synced at: 7 months ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

haodemon/HadoopStreaming

Set of Input Formats for Hadoop Streaming

Language: Java - Size: 14.6 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 0

smohammadhejazi/twitter-mapreduce-practice

Applying MapReduce in Java on a Twitter dataset using Apache Hadoop

Language: Java - Size: 39.2 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Trisha11r/covid_data_analysis_mapreduce

COVID-19 data analysis with MapReduce

Language: Java - Size: 8.79 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

shuuji3/spark-ceph-connector

🌟Spark Ceph Connector: Implementation of Hadoop Filesystem API for Ceph

Language: Scala - Size: 99.6 KB - Last synced at: 7 days ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

bayudwiyansatria/library-java-apache-hadoop

Apache Hadoop. Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Originally designed for computer clusters built from commodity hardware—still the common use—it has also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

Language: Java - Size: 352 KB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 1