GitHub topics: hadoop-mapreduce

Repositories

NitchayaninT/EGCI466_BigData

For big data processing course. Lecture includes the use of hadoop, mongoDB, etc

Language: Jupyter Notebook - Size: 2.62 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

jathavaan/bds-seoul-hadoop

Language: Python - Size: 81.1 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

benedekh/bigdata-projects

Student projects in Big Data field.

Language: Java - Size: 198 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 19 - Forks: 12

janheinrichmerker/hadoop-ktx

💾 Kotlin Extensions for Apache Hadoop (MapReduce).

Language: Kotlin - Size: 178 KB - Last synced at: 5 days ago - Pushed at: 19 days ago - Stars: 1 - Forks: 0

SaltFishGC/SteamGameDataAnalysis

大数据课设，steam游戏数据分析，结合hadoop+hive+sqoop+mysql+springboot+echarts展示结果。

Language: JavaScript - Size: 9.3 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

Lokeshkanna7/An-End-to-End-Big-Data-Pipeline-for-Amazon-Book-Reviews-using-Hadoop-and-Spark

A scalable big data pipeline built with Hadoop and Spark to analyze Amazon book reviews. This project performs sentiment analysis, rating prediction, and fake review detection using PySpark, demonstrating real-world applications of distributed systems and machine learning.

Language: Jupyter Notebook - Size: 459 KB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0

mahmoudparsian/data-algorithms-book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Language: Java - Size: 397 MB - Last synced at: 24 days ago - Pushed at: 8 months ago - Stars: 1,075 - Forks: 661

JKA098/Pokemon-Feistiness-Apache-Spark-Job

The following readme file, assume that before running the Spark analytic job, you have already installed the correct versions of **Java**, **Hadoop**, **Spark** and that you are inside **Ubuntu**.

Language: Python - Size: 184 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

JKA098/Pokemon-Feistiness-MapReduce-Job

This Project aims to implement a **Hadoop MapReduce job in Pseudo-Distributed Mode** to determine the **feistiest Pokémon** based on their **type**. The job processes the Pokémon dataset (`pokemon.csv`) and outputs a CSV file containing Pokémon **type1, type2, name, and feistiness score**.

Language: Python - Size: 220 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

taabishhh/LLM_Preprocessing

This project implements a Byte Pair Encoding (BPE) tokenization approach along with a Word2Vec model to generate word embeddings from a text corpus. The implementation leverages Apache Hadoop for distributed processing and includes evaluation metrics for optimal dimensionality of embeddings.

Language: Scala - Size: 7.37 MB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

Yousuf1733/Titanic-Dataset-Analysis

Exploratory data analysis of the Titanic dataset, uncovering insights on passenger survival rates based on gender, age, and class. Includes data cleaning, visualization, and findings.

Language: Jupyter Notebook - Size: 71.3 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

groda/big_data

Tutorials on Big Data essentials: Hadoop, MapReduce, Spark. Explore a variety of tutorials and demonstrations on Big Data technologies, primarily in the form of Jupyter notebooks. Most notebooks are self-contained and live—ready to run with a click.

Language: Jupyter Notebook - Size: 51.9 MB - Last synced at: 20 days ago - Pushed at: about 2 months ago - Stars: 75 - Forks: 26

ArianaPerez-24/Hadoop-MapReduce-de-WordCount

Ejercicios para contar palabras, ordenar numeros (de menor a mayor) y resolver sudoku.

Language: Shell - Size: 837 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

KeerthanaJ-rec/210701118-CS19P16-DA-Lab

Data Analytics Laboratory

Language: R - Size: 23.1 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

senthuran16/word-count-streaming-python-hadoop-mapreduce

A word count streaming MapReduce implementation with Python

Language: Python - Size: 586 KB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

sueszli/sparkly-svm

distributed training of a SVM with sparkML

Language: Jupyter Notebook - Size: 21.6 MB - Last synced at: 30 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

lokk798/BigData-Quiz-Bank

A comprehensive collection of multiple-choice questions (MCQs) and assessments covering Hadoop, MapReduce, and the broader Big Data ecosystem.

Size: 5.86 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

touero/ctenopharyngodon-idella

Use the MapReduce's Java interface to distributed crawle the data of Chinese universities and learn basic knowledge of hdfs.

Language: Java - Size: 3.75 MB - Last synced at: 27 days ago - Pushed at: 8 months ago - Stars: 140 - Forks: 0

imsanjoykb/PySpark-Bootcamp

My Practice and project on PySpark

Language: Jupyter Notebook - Size: 4.52 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 3

josericodata/josericodata

Adding a cool README file

Size: 87.9 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

elaaatif/JPEG-and-JPEG2000-compression-on-Multi-node-cluster-using-hadoop-and-spark

Big Data technologies can be leveraged for efficient, distributed image compression using JPEG2000 (Spark) and JPEG (MapReduce).

Size: 14.3 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

groda/hats

Hadoop Ansible Test Suite

Language: Shell - Size: 33.2 KB - Last synced at: 15 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

nikisetti01/Hadoop-MapReduce-LetterFrequency-Analysis

Simple example of Hadoop Application count letter, with an intersting Romance Language Analysis

Language: Jupyter Notebook - Size: 2.71 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 2

berksudan/Analysis-on-Big-Data-with-Hadoop

Implementation of Statistical Methods via Hadoop Map-Reduce Library.

Language: Java - Size: 75.2 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

pngo1997/Big-Data-Mining-Project-PageRank-Hadoop-Streaming

Explores Big Data Processing using Hadoop & MapReduce.

Language: Jupyter Notebook - Size: 2.54 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

madhurimarawat/Big-Data-Analytics

This repository demonstrates big data processing, visualization, and machine learning using tools such as Hadoop, Spark, Kafka, and Python.

Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 1

chouaib-629/CustomerSegmentation

Hadoop-based Customer Segmentation project using the Online Retail Dataset. Implements MapReduce for processing and Python for preprocessing to uncover customer purchasing patterns for targeted marketing.

Language: Jupyter Notebook - Size: 260 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

bytedance/CloudShuffleService

Cloud Shuffle Service(CSS) is a general purpose remote shuffle solution for compute engines, including Spark/Flink/MapReduce.

Language: Java - Size: 1.23 MB - Last synced at: 26 days ago - Pushed at: about 1 year ago - Stars: 255 - Forks: 58

arkady-emelyanov/hadoop-playground 📦

🐘Yet another Hadoop playground

Language: Shell - Size: 49.8 KB - Last synced at: 9 days ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 1

developer-sdk/beginner-bigdata-example

Hadoop, Hive, Spark 작업의 예제들

Language: Java - Size: 3.34 MB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 1

Hadeel-Abdeljalil/-Advanced-Topics-in-Database-DBMS--University-Assignment

Language: Java - Size: 411 KB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

chouaib-629/MovieRecommendation

A Hadoop-based Movie Recommendation System using the MovieLens dataset, demonstrating MapReduce for sorting and processing movie ratings.

Language: Java - Size: 320 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

QiushiSun/Distributed-Computing-Systems

2021 Spring (Distributed Computing Systems) 分布式系统与编程

Language: Java - Size: 101 MB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 15 - Forks: 1

krishnadey30/Intro-to-Hadoop-and-MapReduce

Language: Python - Size: 6.54 MB - Last synced at: 3 months ago - Pushed at: almost 7 years ago - Stars: 2 - Forks: 0

krishnadey30/NewsHeadlines

This repository have codes that extracts meaningful information from News headline data-set.

Language: Python - Size: 85.9 KB - Last synced at: 2 months ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 2

benjdiasaad/MapReduce_WordCount

Création d'un programme Hadoop Java : compteur d’occurrence de mots. Si vous souhaitez compiler manuellement le code sur la machine virtuelle Hadoop, vous devrez y copier ce code dans la VM

Language: Java - Size: 11.7 KB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

singhdivyank/MongoHadoop

A MongoDB and Hadoop cheat sheet with some commands and a few questions

Language: Python - Size: 499 KB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Mariam-iftikhar/BigDataProjects

The repository showcases a series of exercises and projects focused on big data processing using Hadoop, HBase, Hive, and Spark with Python. Hosted on AWS EMR, these projects demonstrate efficient data handling and processing techniques, leveraging the power of cloud computing to tackle complex data challenges.

Size: 10.4 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

sephiroth7712/K-Nearest-Neigbours

Implementation of K-Nearest Neighbors algorithm using multiple parallel computing approaches: CUDA (GPU), Hadoop, Spark, MPI, OpenMP, and PThreads. Demonstrates scalable machine learning across different parallel computing paradigms from GPU to distributed frameworks.

Language: C++ - Size: 19.5 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

mehwishferoz/BDA-project

A Hadoop MapReduce project analyzing the Consumer Complaints dataset with five queries to extract insights like complaints by product, state, company, tags, and timely responses.

Language: Java - Size: 7.42 MB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

HabibAroua/Newspaper-analysis

Language: Java - Size: 12.5 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 1

SAKET-SK/Semester6-SPPU-Data-Analysis-Lab

I installed Hadoop on Virtual Machine and all Assignments are performed on Ubuntu OS. Refer to this repo for completion of the Hadoop Assignments. It is recommended that you have a stable internet connection while doing these things.

Language: Rebol - Size: 3.24 MB - Last synced at: 15 days ago - Pushed at: about 2 years ago - Stars: 13 - Forks: 6

m-anshu/big-data-coursework

Big Data coursework material

Language: Shell - Size: 3.27 MB - Last synced at: 8 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

RiccardoRevalor/MapReduce

Collection of exercises regarding Hadoop and MapReduce approach

Language: Java - Size: 71.3 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

chriniko13/apache-hadoop-word-count-example

Language: Java - Size: 5.86 KB - Last synced at: 4 months ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0

amaankhan02/maplejuice

A parallel distributed batch processing framework similar to Hadoop MapReduce with a SQL Engine and a distributed file system

Language: Go - Size: 864 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

MariaDukmak/Hadopy

Easy parallel map-reduce command line tool

Language: Python - Size: 28.3 KB - Last synced at: 1 day ago - Pushed at: about 4 years ago - Stars: 7 - Forks: 0

Rifat392000/BigDataAnalytics

Language: Jupyter Notebook - Size: 18.4 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

burhanahmed1/Big-Data-Analytics

Practice tasks in Python programming language using Hadoop, MRJob, PySpark for Big Data Analytics.

Language: Jupyter Notebook - Size: 40 KB - Last synced at: 4 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 0

drexly/movie140reviewcorpus

네이버 영화 164397건 중 140자 평이 있는 영화별 평점 raw data for spark

Size: 336 MB - Last synced at: 9 months ago - Pushed at: over 7 years ago - Stars: 7 - Forks: 5

prateekkr1/Project-Work

This repository contains some of my personal projects.

Language: Jupyter Notebook - Size: 3.79 MB - Last synced at: 10 months ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

WilliamCallao/HadoopNewsTrends

News trend analysis using Hadoop in a virtualized CentOS environment

Language: Python - Size: 17.4 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

ZiadSalah2003/BigData-Project

"BigData-Project", is a comprehensive Big Data solution that involves various operations such as web crawling, PageRank algorithm, TF-IDF calculations, and inverted index creation using Hadoop.

Size: 83 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

viseshrp/PageRank-MapReduce-Implementation

The MapReduce-Hadoop implementation of Google's PageRank algorithm

Language: Java - Size: 206 KB - Last synced at: 12 days ago - Pushed at: about 8 years ago - Stars: 2 - Forks: 0

highoncarbs/hadoopwithpy

:elephant: :heavy_plus_sign: :snake: Learning Hadoop with Python

Language: Python - Size: 86.6 MB - Last synced at: 3 months ago - Pushed at: over 7 years ago - Stars: 3 - Forks: 0

sharma-n/global_event_analytics

Big data analytics using Hadoop on GDELT global news dataset.

Language: Java - Size: 2.66 MB - Last synced at: 11 months ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 1

KingJin-web/Hadoop

hadoop-hdfs 以及 mapreduce 学习

Language: Java - Size: 7.56 MB - Last synced at: 12 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 1

29DCH/Hadoop-HDFS-MapReduce-Examples

Java API操作HDFS文件、基于MapReduce的词频统计程序及其重构、MapReduce编程之Combiner、Partitioner组件应用

Language: Java - Size: 35.2 KB - Last synced at: 4 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 1

DecioXXIV/BD-StockAnalysis

Repository per il Secondo Progetto del Corso di "Big Data" (2023/24)

Language: Python - Size: 36.1 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

jbw/hadoop-docker-cluster

Hadoop cluster on Docker (single host)

Language: Shell - Size: 159 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

Coursal/Text-Sentiment-Analysis-In-Hadoop-And-Spark

The source code developed and used for the purposes of my thesis with the same title under the guidance of my supervisor professor Vasilis Mamalis for the Department of Informatics and Computer Engineering of the University of West Attica.

Language: Java - Size: 66.5 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

Coursal/Hadoop-Letter-File-Index-Counter

A Hadoop-based Java project that counts the max number of word occurences for each letter in a textfile of a folder.

Language: Java - Size: 213 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

Coursal/Hadoop-Examples

Some simple, kinda introductory projects based on Apache Hadoop to be used as guides in order to make the MapReduce model look less weird or boring.

Language: Java - Size: 340 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 2

StevenMonty/MapReduceSearchEngine

A containerized search engine GUI that communicates with a Hadoop cluster running MapReduce on GCP to create Inverted Indices for search engine queries.

Language: Java - Size: 15.8 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

chicuongdev2002/BigData_Hadoop_MapReduce

Use Scrapy Hadoop PigLatin

Language: Python - Size: 6.99 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

subhash26jan96/cluster

This repository has a hadoop cluster code that are automated, ondemand, manual using by python, linux, html etc.

Language: Python - Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 1

Jacob12138xieyuan/hadoop-mapreduce-with-python

hadoop mapreduce algorithm with hadoop streaming (Python)

Language: Jupyter Notebook - Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

VaishnavJois/CLOUDERA

Cloudera commands used for Big Data Analytics

Size: 13.7 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

raineydavid/big-data-processing

Big Data Processing Notes from Masters in Big Data Science

Size: 13.7 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

gmarciani/mapreduce-app

Scaffolding for Map/Reduce applications, leveraging Apache Hadoop.

Language: Shell - Size: 1000 Bytes - Last synced at: about 1 year ago - Pushed at: almost 8 years ago - Stars: 1 - Forks: 0

Walrussin/MapReduce-Examples

Analyzing air quality index of eight states

Language: Java - Size: 35.6 MB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

manoharpalanisamy/Advanced-Map-Reduce

Running Map reduce jobs on Hadoop Cluster with customized parameter

Language: Java - Size: 21.5 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

andrejanesic/Hadoop-Beginner-Exercise-Football-Data

Hadoop beginner exercise in analyzing European football teams' statistics over the last 20 years. The goal is to determine which team had the highest win percentage-rate.

Language: Makefile - Size: 453 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

zhermin/topkcommonwords

Extracts the Top K Common Words between 2 Text Files using Hadoop's MapReduce

Language: Java - Size: 84 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

MoustafaAMahmoud/BigDataInDepth

Data Engineering Course

Language: TeX - Size: 78.9 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 15 - Forks: 9

YMaher99/Parallelizing-the-Feedforward-Operation-of-Neural-Networks-in-Hadoop-MapReduce

Leveraging the mapreduce paradigm we propose a solution to parallelize the feedforward operation of neural networks in order to speed it up for sufficiently large NN architectures and for sufficiently large datasets. Tested Using the MNIST dataset results can be found in the results.html and results.ipynb files.

Language: HTML - Size: 2.1 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

rodrigoorf/HadoopStudies

Repo with a few Hadoop exercises

Language: Java - Size: 72.3 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

emrectn/HadoopTutorial

hadoop

Language: Java - Size: 15.6 KB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

prabhuvashwin/PageRank-Algorithm-Implementation 📦

Implementation of Google's PageRank algorithm using Java, Hadoop, and MapReduce

Language: Java - Size: 10.7 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 3 - Forks: 1

prabhuvashwin/TFIDF-SearchQuery 📦

Implementation for TFIDF and Searching of queries using keywords, using Java and Apache Hadoop

Language: HTML - Size: 887 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 1

prabhuvashwin/Credit-Card-Fraud-Detection 📦

Naive Bayes classifier and Logistic Regression classifier to predict whether a transaction is fraudulent or not

Language: Java - Size: 42.3 MB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 2

tableMinPark/trendflow

❗ 트랜드 분석 플랫폼 - SSAFY 8기 특화 프로젝트

Language: Java - Size: 55.6 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Dave-Vedant/BigDataTech

This Repository contains the small projects related to Hive, Hadoop, and Spark. Its my contribution of learning new technology and provide my concise knowledge on big data different infrastructures.

Language: Scala - Size: 533 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1