GitHub topics: apache-hadoop
PasanAbeysekara/Taxi-Pickup-Hotspot-Analysis-using-Hadoop-MapReduce
This project analyzes one month of NYC Yellow Taxi trip data (January 2016) to identify the busiest taxi pickup locations. It utilizes the Hadoop MapReduce framework to process the data and a lookup table to map location IDs to human-readable zone names.
Language: Java - Size: 5.65 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Malisha4065/HadoopConfiguration
Apache Hadoop Cluster configuration with original apache/hadoop:3.4.1 docker image (with YARN)
Language: Shell - Size: 6.84 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3 - Forks: 0

Malisha4065/HadoopProject
Map reducing task with apache hadoop.
Language: Java - Size: 17.6 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

tencentyun/hadoop-cos
hadoop-cos(CosN文件系统)为Apache Hadoop、Spark以及Tez等大数据计算框架集成提供支持,可以像访问HDFS一样读写存储在腾讯云COS上的数据。同时也支持作为Druid等查询与分析引擎的Deep Storage
Language: Java - Size: 105 MB - Last synced at: 7 days ago - Pushed at: 14 days ago - Stars: 85 - Forks: 52

mahmoudparsian/data-algorithms-book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Language: Java - Size: 397 MB - Last synced at: 18 days ago - Pushed at: 8 months ago - Stars: 1,075 - Forks: 661

mahmoudparsian/big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Language: HTML - Size: 601 MB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 158 - Forks: 143

taabishhh/LLM_Preprocessing
This project implements a Byte Pair Encoding (BPE) tokenization approach along with a Word2Vec model to generate word embeddings from a text corpus. The implementation leverages Apache Hadoop for distributed processing and includes evaluation metrics for optimal dimensionality of embeddings.
Language: Scala - Size: 7.37 MB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

L00kAhead/hadoop_cluster
HDFS Cluster Setup with Docker. A simple Docker setup for a Hadoop HDFS cluster with one NameNode and two DataNodes for testing and learning purposes.
Size: 3.91 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

PBWebMedia/yarn-prometheus-exporter
Export Hadoop YARN (resource-manager) metrics in prometheus format
Language: Go - Size: 81.1 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 51 - Forks: 20

sawallesalfo/Big-Data-Technologies
Big Data Technologies can be defined as software tools for analyzing, processing, and extracting data from an extremely complex and large data set with which traditional management tools can never deal
Language: Python - Size: 906 KB - Last synced at: 27 days ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 0

miguelTavora/Word-Count 📦
Problem of word count done using Apache Hadoop
Language: Java - Size: 715 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

yingzhuo/logback-flume-appender
logback appender for apache-flume
Language: Java - Size: 37.1 KB - Last synced at: 3 days ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

tashi-2004/Apache-Hadoop-Spark-Hive-CyberAnalytics
This project utilizes Apache Hadoop, Hive, and PySpark to process and analyze the UNSW-NB15 dataset, enabling advanced query analysis, machine learning modeling, and visualization. The project demonstrates efficient data ingestion, processing, and predictive analytics for network security insights.
Language: Jupyter Notebook - Size: 2.62 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Narius2030/Hive-DataWarehouse-Analysis
Implement a Hive data warehouse to store meaningful data, apply Machine Learning like Clustering or Regression for dealing with business problems
Language: Jupyter Notebook - Size: 24.9 MB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 3

berksudan/Analysis-on-Big-Data-with-Hadoop
Implementation of Statistical Methods via Hadoop Map-Reduce Library.
Language: Java - Size: 75.2 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

yuhexiong/deploy-hadoop-guide
Size: 61.5 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

yycorcino/distributed-system-for-movie-recommendations Fork of KathiraveluLab/Dragonfly
Apache Spark with Apache Hadoop for Machine Learning Application
Language: Python - Size: 969 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 1

mark-deligiannis/DB-lab-SPARK-project
Term project for NTUA course "Advanced Topics in Database Systems". Big Data analysis is performed on the "Los Angeles Crime Data" dataset.
Language: Python - Size: 331 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

myndaaa/BigDataArchitecture-COS20028-Swinburne
Apache Hadoop – A course for undergraduates | along with Apache Pig and Hive
Language: Java - Size: 2.4 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

aaqib-ahmed-nazir/Naive_Search_Engine
This repository aims to develop a basic search engine utilizing Hadoop's MapReduce framework to index and process extensive text corpora efficiently. The dataset used for this project is a subset of the English Wikipedia dump, totaling 5.2 GB in size. The project focuses on implementing a naive search algorithm to address challenges in information.
Language: Jupyter Notebook - Size: 120 KB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

s911415/apache-hadoop-3.1.0-winutils
HADOOP 3.1.0 winutils
Language: Batchfile - Size: 985 KB - Last synced at: 7 months ago - Pushed at: about 7 years ago - Stars: 73 - Forks: 104

mituskillologies/bigdata-ait-sep24
Programs conducted at Army Institute of Technology, Pune in training on Big Data Analytics during September 2024.
Language: Java - Size: 10.7 KB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

chriskery/hadoop-operator
Kubernetes operator for managing the lifecycle of Apache Hadoop Yarn Tasks on Kubernetes.
Language: Go - Size: 3.06 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 1

BhushanSagar/Car-Insurance-Cold-Calls-Data-Analysis
Car Insurance Cold Calls Data Analysis using Apache Hive
Language: HiveQL - Size: 1.17 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Abdelhakim-gh/BigData_Project
This project aims to establish a data streaming pipeline with storage, processing, and visualization
Language: Python - Size: 28.7 MB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

nghoanglongde/spark-cluster-with-docker
The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker
Language: Shell - Size: 43 KB - Last synced at: 9 months ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 2

Guru107/hadoop-small-files-merger
A Spark application to merge small files on Hadoop
Language: Scala - Size: 101 KB - Last synced at: about 2 months ago - Pushed at: almost 5 years ago - Stars: 8 - Forks: 3

shawnzhu/docker-hive-1 Fork of IBM/docker-hive
Docker image for Hive Metastore
Language: Dockerfile - Size: 19.5 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

Coursal/Text-Sentiment-Analysis-In-Hadoop-And-Spark
The source code developed and used for the purposes of my thesis with the same title under the guidance of my supervisor professor Vasilis Mamalis for the Department of Informatics and Computer Engineering of the University of West Attica.
Language: Java - Size: 66.5 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 1

Coursal/Hadoop-Letter-File-Index-Counter
A Hadoop-based Java project that counts the max number of word occurences for each letter in a textfile of a folder.
Language: Java - Size: 213 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

Coursal/Hadoop-Examples
Some simple, kinda introductory projects based on Apache Hadoop to be used as guides in order to make the MapReduce model look less weird or boring.
Language: Java - Size: 340 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 2

jagdish4501/Network-intrusion-Detection
This repository provides a guide to preprocess and analyze the network intrusion data set using NumPy, Pandas, and matplotlib, and implement a random forest classifier machine learning model using Scikit-learn.
Language: Jupyter Notebook - Size: 28.7 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

lepetitprinz/apache-hadoop
Hands-on learning Hadoop
Language: Java - Size: 1.16 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

keramiozsoy/apache-spark-yarn-mode-aws-101
An example of installation Apache Spark on AWS
Size: 146 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

carlosemsantana/docker-hadoop
Preparação de um ambiente de desenvolvimento e testes para Apache Hadoop.
Size: 4.86 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

whoami-anoint/EasyHadoop
Simplified Hadoop Setup and Configuration Automation
Language: Shell - Size: 12 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

felidsche/movie-recommender
A movie recommendation system built using Apache Spark’s ML library
Language: Python - Size: 829 KB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

felidsche/mail-spam-filter
An email spam filter using Apache Spark’s ML library
Language: Python - Size: 212 KB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 1

dmarks84/Coursework_Capstone_Full_Data_Engineering
Final Project for IBM Data Engineering & Python Professional Certificate -- Applied all skills and methods utilized in the series of courses for this certification
Language: Jupyter Notebook - Size: 4.25 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

SomeshChevella/Apache-Hadoop-Map-Reduce--Basic-Sentiment-Analysis-on-Yelp-Dataset
In this project we will use Hadoop MapReduce to implement a very basic “Sentiment Analysis” using the review text in the Yelp Academic Dataset as training data.
Language: Java - Size: 7.39 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

VikentiosVitalis/advanced_topics_in_database_systems
Data Science Project - for 'Advanced Topics in Database Systems' M.Sc. Course ECE @ntua
Language: Python - Size: 10.6 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

esakik/data-engineering-essentials
Samples related to data engineering, e.g. spark, embulk, airflow, etc.
Language: Python - Size: 413 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

mohammadtavakoli78/Cloud-Computing
This is projects of Cloud Computing Course
Language: Python - Size: 9.1 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 1

gangodu/cloud
AWS Cloudera Hadoop setup with H2O, Spark, MR
Language: Java - Size: 49.1 MB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0

on2e/ntua-atdb
Advanced Topics in Databases course project - NTUA ECE - 2022-23
Language: Python - Size: 24.4 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

probaldhar/AprioriMapReduce
Java code for Apriori algorithm using MapReduce
Language: Java - Size: 2.68 MB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

heracliteanflux/exercises-scala
Exercises in the Scala programming language with an emphasis on big data programming and applications in Apache Hadoop and Apache Spark.
Language: Java - Size: 3.29 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

hridayns/Big-Data-Apache-server-logs-analysis-using-Pig-and-Python
Big Data – Apache server logs analysis using Pig and Python
Language: Python - Size: 4.88 KB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

rahulinux/ansible-hadoop
Apache Hadoop multi-node setup using ansible
Size: 10.7 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

Lucass97/FlightAnalysis
This project implemented a lambda architecture for analyzing domestic flight data in the US from 2009 to 2020. It used Apache Spark for batch processing, Spark Streaming for real-time analysis, and SVM models to predict flight cancellations and delays, with Docker for cluster management and Grafana for real-time visualization.
Language: Jupyter Notebook - Size: 5.66 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 1

surbhitawasthi/MiniProject-AadharCensusDataValidation
A small code to validate the Census data on the basis of Aadhar Data
Language: Java - Size: 6.14 MB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 0

kowaalczyk/spark-minimal-algorithms
An python implementation of Minimal Mapreduce Algorithms for Apache Spark
Language: Python - Size: 52.7 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 0

Jordan396/Giraph-1.2.0-Installation 📦
Instructions for Installing Giraph-1.2.0
Size: 118 KB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 0

aquib-sh/setup-hadoop
A BASH script to setup Apache Hadoop and Apache Hive with Derby database on Debian GNU/Linux
Language: Shell - Size: 37.1 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Bayunova28/Spotify_Lyrics
This repository contains my personal project to generate mapreduce using apache hadoop
Language: Shell - Size: 19.7 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

luckyp71/hadoop-hbase-phoenix-zookeeper-integration
Hadoop, HBase, Phoenix, and Zookeeper Integration
Language: Shell - Size: 30.3 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

Umer86/Dice-Big-Data-Certification
This repository contains all the material related to this big data certification.
Size: 13.6 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

unobatbayar/big-data-processing
Learning Apache Hadoop for Big Data. Moreover, exploring Map Reduce, Apache Spark RDD, Distributed Processing and Stream Processing
Language: Python - Size: 3.9 MB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

bdoepf/aws-emr-prometheus
Language: HCL - Size: 38.1 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 1

LorenzoGianassi/Twitter_Sentiment_Analysis_Lambda_Architecture
Full term Project of the exam of Parallel Computing of University of Florence. Implementation of Twitter Sentiment Analysis using Hadoop, Apache Storm and HBase to obtain parallelization.
Language: Java - Size: 661 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

mohammadzainabbas/BDM
Big Data Management ✨
Language: Jupyter Notebook - Size: 782 KB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

realtimedatalake/hive-metastore-docker
Containerized Apache Hive Metastore for horizontally scalable Hive Metastore deployments
Language: Dockerfile - Size: 19.5 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 6

krishd46/AverageSalary-Hadoop-MapReduce
🗄️ Finding the average salary in Hadoop HDFS using MapReduce.
Language: Java - Size: 197 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

FayStatha/atds-project-NTUA-2021
A project for Advanced Topics in Database Systems course of ECE, NTUA for fall semester of academic year 2020-2021.
Language: Python - Size: 635 KB - Last synced at: 7 months ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

haodemon/HadoopStreaming
Set of Input Formats for Hadoop Streaming
Language: Java - Size: 14.6 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 0

smohammadhejazi/twitter-mapreduce-practice
Applying MapReduce in Java on a Twitter dataset using Apache Hadoop
Language: Java - Size: 39.2 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Trisha11r/covid_data_analysis_mapreduce
COVID-19 data analysis with MapReduce
Language: Java - Size: 8.79 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

shuuji3/spark-ceph-connector
🌟Spark Ceph Connector: Implementation of Hadoop Filesystem API for Ceph
Language: Scala - Size: 99.6 KB - Last synced at: 7 days ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

bayudwiyansatria/library-java-apache-hadoop
Apache Hadoop. Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Originally designed for computer clusters built from commodity hardware—still the common use—it has also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.
Language: Java - Size: 352 KB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 1

RBC-DSAI-IITM/DCEIL
A fast, scalable and distributed community detection algorithm based on CEIL scoring function.
Language: Scala - Size: 70.7 MB - Last synced at: 12 months ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 3

felidsche/cloud-computing-2020 Fork of flontis/CloudComputing2020
Repository for the master's course Cloud Computing of the TU Berlin in the winter term 2020/21.
Language: Shell - Size: 5.23 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

Ajaypathak372/ansible-hadoop
Setup Hadoop HDFS Cluster using Ansible
Size: 4.88 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

jaskier07/Hadoop-lab
Solving simple tasks with Apache Hadoop.
Language: Java - Size: 32.2 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

Bahaabrougui/Big-Data-Smart-Cars-Pipeline-ServerSide
Big Data pipeline for real-time sensor fusion and predective analysis.
Language: Java - Size: 117 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

victorpereira01/recomendation-system
💡 Sistema de recomendações desenvolvido no Bootcamp Backend Developer Carrefour, utilizando Apache Mahout
Language: Java - Size: 322 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

Chabane/spark-custom-datasource
Language: Java - Size: 1.9 MB - Last synced at: 3 months ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

tspannhw/links
Links
Language: Scala - Size: 8.79 KB - Last synced at: about 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 1
