GitHub topics: mapreduce-python
mahmoudparsian/big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Language: HTML - Size: 601 MB - Last synced at: 8 days ago - Pushed at: 6 months ago - Stars: 158 - Forks: 143

yevheniidatsenko/goit-algo2-hw-06
🗒️ Home Task - Design and Analysis of Algorithms (Fundamentals of Parallel Computing and the MapReduce Model)
Language: Python - Size: 0 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

TheVinh-Ha-1710/Big-Data-Pipeline-Design
This project builds a data pipeline implementing the ETL process.
Language: Python - Size: 738 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

LesiaUKR/goit-algo2-hw-06
Master's | Design & Analysis of Algorithms | Fundamentals of Parallel Computing and the MapReduce Model
Language: Python - Size: 98.6 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Samuele95/mapyreduce
Lightweight and extensible library to execute MapReduce-like jobs in Python
Language: Python - Size: 35.2 KB - Last synced at: 25 days ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

krishnadey30/NewsHeadlines
This repository have codes that extracts meaningful information from News headline data-set.
Language: Python - Size: 85.9 KB - Last synced at: 2 months ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 2

aryanGupta-09/Kmeans-using-MapReduce
K-means clustering algorithm using MapReduce.
Language: Python - Size: 23.4 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

aaqib-ahmed-nazir/Naive_Search_Engine
This repository aims to develop a basic search engine utilizing Hadoop's MapReduce framework to index and process extensive text corpora efficiently. The dataset used for this project is a subset of the English Wikipedia dump, totaling 5.2 GB in size. The project focuses on implementing a naive search algorithm to address challenges in information.
Language: Jupyter Notebook - Size: 120 KB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Karansheth/Analyzing-Social-Health-Factors
• Preprocessed and analyzed 7GB of Social Data collected from various sources in a distributed manner using Spark. • Classified each USA zip code into 8 groups based on their social health using Euclidean Distance-based clustering approach, considering socioeconomic factors like education, unemployment, health, mortality, old-age dependency, etc.
Language: Jupyter Notebook - Size: 4.9 MB - Last synced at: 10 months ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

Gthejesraj/Data_Science
Mastering Data Science
Language: Jupyter Notebook - Size: 9.61 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

sanfx/mapreduce_paradigm_distributed_computing
Language: Python - Size: 2.31 MB - Last synced at: 6 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

PrudhviVajja/DistributedMapReduce
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real-world tasks are expressible in this model.
Language: Python - Size: 1.08 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

fbaldi6/PageRank-Spark Fork of edofazza/PageRank-Spark
Implementation of the MapReduce PageRank algorithm using the Spark framework both in Python and in Java (developed for Cloud Computing course)
Size: 4.99 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

AdamJeddy/BigData-Bits-Workshop
BigData Workshop - Python MapReduce for word frequency analysis on varied datasets.
Language: Jupyter Notebook - Size: 9.68 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

HarshitDawar55/MapReduce
Programs for MapReduce written in java with least complexity!
Language: Java - Size: 76.2 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

yuliya-akchurina/Big-Data-Programming
Big Data Programming Projects
Language: Python - Size: 57.5 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

AmitabhCh822/BigData-MapReduce-MovieRatings-Analysis
Big Data analysis project using MapReduce in Python to process movie ratings. Includes scripts for aggregating ratings and identifying the most rated movies, demonstrating data analysis on a large scale.
Language: Python - Size: 9.77 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

anshul1004/MutualFriends
Implementation of Hadoop and Spark
Language: Java - Size: 23 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

Roon311/WDC-PageRank-Hadoop-MapReduce
Performing Map reduce to get the page rank on the WDC data.
Language: Python - Size: 1000 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

manishghop/CS651-UW-Project
CS651 Final Project
Language: Jupyter Notebook - Size: 1.33 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

shivamgupta7/Hadoop
Hadoop Applications. In repo have Big Data tools like Spark(pyspark), HIVE(pyhive), Elastic Search, Oozie. I can use all these tools using python libraries after setup all the configration.
Language: Jupyter Notebook - Size: 4.82 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 1

a22057916w/BDM
Big Data Mining and Applications
Language: Python - Size: 32.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

OskarMierkiewicz/Hadoop-and-MapReduce-with-Python
Hadoop MapReduce with Python
Language: Python - Size: 2.93 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Luyayeh/MatrixMultiplicationMR_LY
MapReduce to perform matrix multiplication.
Language: Python - Size: 257 KB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

BenitaDiop/FullStackBigData-with-SPARK
Pulled 10GB ofYelp Business data through the terminal via Kaggle API. The data was then pushed to and AWS S3 Bucket bucket for storage and analyzed on a Elastic MapReduce Cluster on a Jupyter Notebook using PySpark
Language: Jupyter Notebook - Size: 848 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

ZhiyuZhang803/DSCI553_Data_Mining_With_Spark
This repo contains the implementation of popular data mining algorithms with Python and Spark. It contains the homework assignments of DSCI553 2022 Fall. Final Grade: 103.5% (including bonus)
Language: Python - Size: 9.02 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

HarigovindV10/NYC-Subway-Data-Analysis
An analysis of NYC Subway Data using Hadoop Map Reduce
Language: Jupyter Notebook - Size: 529 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 1

MagdaleneHo/MapReduce
A simple project on the use of map and reduce in Hadoop.
Language: Python - Size: 6.84 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 1

manursanchez/desarrollosMRJob
Desarrollos en Python de patrones MapReduce, que no han sido incluidos en el TFG final.
Language: Jupyter Notebook - Size: 28.2 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 1

mdarm/map-reduce-project
Project on MapReduce for the Μ111 - Big Data Management course, NKUA, Spring 2023.
Language: TeX - Size: 3.7 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Raphael-Jin/EDFS
Emulation-based System for Distributed File storage and Parallel Computation
Language: Python - Size: 5.79 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Raveesh1505/BigData-Training
Big data training material
Language: Python - Size: 45.9 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

SinghHarshita/Clustering-Algorithms-Spark
KMeans, Cure and Canpoy algorithms are demonstrated using Pyspark.
Language: Jupyter Notebook - Size: 150 KB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 5 - Forks: 0

ahmadsalimi/dist_mr
A distributed map-reduce implemented by Python 3 and gRPC
Language: Python - Size: 1.2 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 1

arminZolfaghari/docker-hadoop Fork of big-data-europe/docker-hadoop
Apache Hadoop docker image | Running Python MapReduce
Language: Shell - Size: 94.7 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

phaniteja5789/MapReduce
Language: Python - Size: 2.93 KB - Last synced at: 25 days ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

Longannn/MapReduce
YouTube data analysis with comparison between big data tools (Apache Hadoop) and conventional python.
Language: Jupyter Notebook - Size: 2.41 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

PradeepSingh1988/mapreduce
A framework to run map reduce program. Implemented based on map reduce paper
Language: Python - Size: 5.09 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Sahith-8055/20186008_CC
Distributed Computing using Hadoop, Docker and Python (Map Reduce)
Language: Python - Size: 51.8 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

kkoless/MapReduce
Hadoop MapReduce Python
Language: Python - Size: 1.05 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

Edyarich/parallel-computations
"Parallel computation" course homework
Language: Cuda - Size: 358 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 1

RiccardoSagramoni/map-reduce-bloom-filter 📦
University Project for "Cloud Computing" course (MSc Computer Engineering @ University of Pisa). MapReduce applications implemented in Hadoop and Spark.
Language: Java - Size: 8.86 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

scullen99/Map_Shuffle_Reduce
Language: Python - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Bayunova28/Spotify_Lyrics
This repository contains my personal project to generate mapreduce using apache hadoop
Language: Shell - Size: 19.7 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

ahmedopolis/Flight_Distance_Calculation_with_MapReduce Fork of Nicole-Hong/Flight_Distance_Calculation_with_MapReduce
This project was completed as the small scale team project at YCBS 257 Data at Scale class in Professional Development Certificate Program in Data Science and Machine Learning at McGill University, and the project introduced the MapReduce functions for solving the problems with Big Data.
Language: Jupyter Notebook - Size: 9.17 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

CLDXiang/Mining-Frequent-Pattern-from-Search-History
《大数据挖掘技术》@复旦 课程项目,试图从搜狗实验室用户查询日志数据(2008)中找出搜索记录中有较高支持度关键词的频繁二项集。在实现层面上,我搭建了一个由五台服务器组成的微型 Hadoop 集群,并且用 Python 实现了 Parallel FP-Growth 算法中的三个 MapReduce 过程。
Language: Python - Size: 1.52 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 26 - Forks: 2

sreetamparida/Hiraishin
A REST-based service that translates the SQL query into MapReduce and Spark jobs. It runs these jobs and provides the JSON object. SQL to MapReduce and Spark translator.
Language: Python - Size: 194 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 0

NbnbZero/Recommendation-System
基于Item-based CF和XGBRegressor完成的用户对商品的推荐系统
Language: Python - Size: 24.2 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 0

huynhtloi/Mining-Of-Massive-Datasets
Introduction to Mining Of Massive Datasets
Language: Jupyter Notebook - Size: 22 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

24jmwangi/python_mapred
Using mapreduce in hadoop and python to score sentiments
Language: Python - Size: 43 KB - Last synced at: 7 months ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

MaimoonaKhilji/MapReduce-Programs
MapReduce Program Codes in Python Spyder
Language: Jupyter Notebook - Size: 15.5 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

MaimoonaKhilji/MapReduce-Presentation
Mapreduce Presentation
Size: 945 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

yoongoing/bigdata_pyspark
⚡️공개용 맵리듀스 플랫폼인 Spark를 사용하여 데이터마이닝을 해보자⚡️
Language: Jupyter Notebook - Size: 438 KB - Last synced at: 2 months ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

python-supply/map-reduce-and-multiprocessing
Multiprocessing can be an effective way to speed up a time-consuming workflow via parallelization. This article illustrates how multiprocessing can be utilized in a concise way when implementing MapReduce-like workflows.
Language: Jupyter Notebook - Size: 166 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

reeryid/BigData_Tubes
Tugas Besar Big Data (hadoop)
Language: Python - Size: 48.8 KB - Last synced at: 7 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

yvgupta03/Big_Data_Assignments_MapReduce_Graphframe
Short projects on UTDallas Big Data course C6350 using PySpark MapReduce and Graphframe library
Language: Jupyter Notebook - Size: 364 KB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 1

ochoajuanm/stack-overflow-mapreduce
Análisis de metadata extraída de Stack Overflow a través del paradigma MapReduce
Language: Python - Size: 13.9 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

abhibalani/emr_lambda
Lambda to start EMR and run a map reduce job
Language: Python - Size: 2.93 KB - Last synced at: over 2 years ago - Pushed at: almost 6 years ago - Stars: 3 - Forks: 1

naman884/Big-Data
Language: Jupyter Notebook - Size: 1.15 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

vigneshSs-07/Bigdata_Technologies
This repo contains all technical knowledge and implementation of big data technologies.
Language: Jupyter Notebook - Size: 1.49 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

gorkinovich/SGDI
Sistemas de Gestión de Datos y de la Información (UCM, 2015)
Language: Java - Size: 2.74 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

nikhitmago/frequent-itemset-association
Market basket analysis of finding frequent itemsets using SON algorithm in Spark
Language: Python - Size: 7.81 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

anshsarkar/Big-Data-Assignments-UE18CS322
A repository containing the source codes for the assignments done as a part of the Big Data course (UE18CS322) at PES University.
Language: Python - Size: 48.9 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

sanjitk7/MapReducePython
A MapReduce implementation in python in a docker simulated distributed system
Language: Python - Size: 19 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

kiababashahi/Montreals_Neighborhood_RDD
In this simple project, I am playing with the data sets of the city of Montreal counting the number of neighborhoods finding the largest ones, their different types, and so on using RDDs.
Language: Python - Size: 3.91 KB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

ashwinpn/WikiSea
Search Engine for Wikipedia.
Language: Python - Size: 96.7 KB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

r-i-c-h-a/MapReduce-based-Mini-HIVE Fork of sharanyavenkat25/MapReduce-based-Mini-HIVE
A Hadoop based Map-Reduce based SQL engine
Language: Python - Size: 162 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

aditeyabaral/mapreduce-word2vec
Implementation of Word2Vec for large datasets as a Map-Reduce Job using Hadoop Streaming.
Language: Python - Size: 1.45 MB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

p-disha/NYC-Parking-Violations
This is an analysis on NYC Parking Violations dataset using PySpark SparkSQL and Map Reduce to find some useful insights.
Language: Python - Size: 9.24 MB - Last synced at: about 2 months ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

NilufaYeasmin/MapReduce
This repo contains implementations of Mapreduce program in a large text corpus with Apache Hadoop Environment | Nilufa Yeasmin | https://www.linkedin.com/in/nilufayeasmin/
Language: CSS - Size: 3.53 MB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

MarcoXM/Bigdata_Programming_Analytics
Language: Jupyter Notebook - Size: 8.1 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

martandsingh/SparkBigData
Apache Spark Big data basics and Machine learning with Big Data
Language: Jupyter Notebook - Size: 105 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

khanhha/map_reduce
map reduce learning
Language: Python - Size: 8.8 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

antoinewg/ocr-page-rank
PageRank algorithm using Hadoop Streaming
Language: Python - Size: 438 KB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

londist/Community-dectection
Language: Python - Size: 9.36 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

hovig/mapreduce
Alternative Mapreduce Simple Example
Language: Python - Size: 36.3 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

Stefan-Mitic/HadoopLearning
Language: Python - Size: 37 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0
