GitHub topics: mapreduce-python

Repositories

mahmoudparsian/big-data-mapreduce-course

Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University

Language: HTML - Size: 614 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 161 - Forks: 143

mixaisealx/DevOps-n-DataOps

Hands-on project demos covering infrastructure automation (Ansible, Docker), big-data processing & streaming (Hive, Spark, Kafka), and network experiments (MitM, TCP-over-UDP).

Language: Python - Size: 61.5 KB - Last synced at: 15 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

oleksii-shcherbak/GoIt-CS

Language: Python - Size: 143 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

ahmadsalimi/dist_mr

A distributed map-reduce implemented by Python 3 and gRPC

Language: Python - Size: 1.2 MB - Last synced at: 15 days ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

yevheniidatsenko/goit-algo2-hw-06

🗒️ Home Task - Design and Analysis of Algorithms (Fundamentals of Parallel Computing and the MapReduce Model)

Language: Python - Size: 0 Bytes - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

TheVinh-Ha-1710/Big-Data-Pipeline-Design

This project builds a data pipeline implementing the ETL process.

Language: Python - Size: 738 KB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

LesiaUKR/goit-algo2-hw-06

Master's | Design & Analysis of Algorithms | Fundamentals of Parallel Computing and the MapReduce Model

Language: Python - Size: 98.6 KB - Last synced at: 6 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Samuele95/mapyreduce

Lightweight and extensible library to execute MapReduce-like jobs in Python

Language: Python - Size: 35.2 KB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

krishnadey30/NewsHeadlines

This repository have codes that extracts meaningful information from News headline data-set.

Language: Python - Size: 85.9 KB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 2

aryanGupta-09/Kmeans-using-MapReduce

K-means clustering algorithm using MapReduce.

Language: Python - Size: 23.4 KB - Last synced at: 8 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

aaqib-ahmed-nazir/Naive_Search_Engine

This repository aims to develop a basic search engine utilizing Hadoop's MapReduce framework to index and process extensive text corpora efficiently. The dataset used for this project is a subset of the English Wikipedia dump, totaling 5.2 GB in size. The project focuses on implementing a naive search algorithm to address challenges in information.

Language: Jupyter Notebook - Size: 120 KB - Last synced at: 9 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Karansheth/Analyzing-Social-Health-Factors

• Preprocessed and analyzed 7GB of Social Data collected from various sources in a distributed manner using Spark. • Classified each USA zip code into 8 groups based on their social health using Euclidean Distance-based clustering approach, considering socioeconomic factors like education, unemployment, health, mortality, old-age dependency, etc.

Language: Jupyter Notebook - Size: 4.9 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Gthejesraj/Data_Science

Mastering Data Science

Language: Jupyter Notebook - Size: 9.61 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

sanfx/mapreduce_paradigm_distributed_computing

Language: Python - Size: 2.31 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

PrudhviVajja/DistributedMapReduce

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real-world tasks are expressible in this model.

Language: Python - Size: 1.08 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

fbaldi6/PageRank-Spark Fork of edofazza/PageRank-Spark

Implementation of the MapReduce PageRank algorithm using the Spark framework both in Python and in Java (developed for Cloud Computing course)

Size: 4.99 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

AdamJeddy/BigData-Bits-Workshop

BigData Workshop - Python MapReduce for word frequency analysis on varied datasets.

Language: Jupyter Notebook - Size: 9.68 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

HarshitDawar55/MapReduce

Programs for MapReduce written in java with least complexity!

Language: Java - Size: 76.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

yuliya-akchurina/Big-Data-Programming

Big Data Programming Projects

Language: Python - Size: 57.5 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

AmitabhCh822/BigData-MapReduce-MovieRatings-Analysis

Big Data analysis project using MapReduce in Python to process movie ratings. Includes scripts for aggregating ratings and identifying the most rated movies, demonstrating data analysis on a large scale.

Language: Python - Size: 9.77 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

anshul1004/MutualFriends

Implementation of Hadoop and Spark

Language: Java - Size: 23 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

Roon311/WDC-PageRank-Hadoop-MapReduce

Performing Map reduce to get the page rank on the WDC data.

Language: Python - Size: 1000 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

manishghop/CS651-UW-Project

CS651 Final Project

Language: Jupyter Notebook - Size: 1.33 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

shivamgupta7/Hadoop

Hadoop Applications. In repo have Big Data tools like Spark(pyspark), HIVE(pyhive), Elastic Search, Oozie. I can use all these tools using python libraries after setup all the configration.

Language: Jupyter Notebook - Size: 4.82 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

a22057916w/BDM

Big Data Mining and Applications

Language: Python - Size: 32.2 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

OskarMierkiewicz/Hadoop-and-MapReduce-with-Python

Hadoop MapReduce with Python

Language: Python - Size: 2.93 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Luyayeh/MatrixMultiplicationMR_LY

MapReduce to perform matrix multiplication.

Language: Python - Size: 257 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

BinetaDiop007/FullStackBigData-with-SPARK

Pulled 10GB ofYelp Business data through the terminal via Kaggle API. The data was then pushed to and AWS S3 Bucket bucket for storage and analyzed on a Elastic MapReduce Cluster on a Jupyter Notebook using PySpark

Language: Jupyter Notebook - Size: 848 KB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

ZhiyuZhang803/DSCI553_Data_Mining_With_Spark

This repo contains the implementation of popular data mining algorithms with Python and Spark. It contains the homework assignments of DSCI553 2022 Fall. Final Grade: 103.5% (including bonus)

Language: Python - Size: 9.02 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

HarigovindV10/NYC-Subway-Data-Analysis

An analysis of NYC Subway Data using Hadoop Map Reduce

Language: Jupyter Notebook - Size: 529 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 1

MagdaleneHo/MapReduce

A simple project on the use of map and reduce in Hadoop.

Language: Python - Size: 6.84 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 1

manursanchez/desarrollosMRJob

Desarrollos en Python de patrones MapReduce, que no han sido incluidos en el TFG final.

Language: Jupyter Notebook - Size: 28.2 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

mdarm/map-reduce-project

Project on MapReduce for the Μ111 - Big Data Management course, NKUA, Spring 2023.

Language: TeX - Size: 3.7 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Raphael-Jin/EDFS

Emulation-based System for Distributed File storage and Parallel Computation

Language: Python - Size: 5.79 MB - Last synced at: 7 months ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

Raveesh1505/BigData-Training

Big data training material

Language: Python - Size: 45.9 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

SinghHarshita/Clustering-Algorithms-Spark

KMeans, Cure and Canpoy algorithms are demonstrated using Pyspark.

Language: Jupyter Notebook - Size: 150 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 0

arminZolfaghari/docker-hadoop Fork of big-data-europe/docker-hadoop

Apache Hadoop docker image | Running Python MapReduce

Language: Shell - Size: 94.7 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

phaniteja5789/MapReduce

Language: Python - Size: 2.93 KB - Last synced at: 7 days ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

Longannn/MapReduce

YouTube data analysis with comparison between big data tools (Apache Hadoop) and conventional python.

Language: Jupyter Notebook - Size: 2.41 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

PradeepSingh1988/mapreduce

A framework to run map reduce program. Implemented based on map reduce paper

Language: Python - Size: 5.09 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Sahith-8055/20186008_CC

Distributed Computing using Hadoop, Docker and Python (Map Reduce)

Language: Python - Size: 51.8 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

kkoless/MapReduce

Hadoop MapReduce Python

Language: Python - Size: 1.05 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

Edyarich/parallel-computations

"Parallel computation" course homework

Language: Cuda - Size: 358 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 1

RiccardoSagramoni/map-reduce-bloom-filter 📦

University Project for "Cloud Computing" course (MSc Computer Engineering @ University of Pisa). MapReduce applications implemented in Hadoop and Spark.

Language: Java - Size: 8.86 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

scullen99/Map_Shuffle_Reduce

Language: Python - Size: 13.7 KB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

Bayunova28/Spotify_Lyrics

This repository contains my personal project to generate mapreduce using apache hadoop

Language: Shell - Size: 19.7 MB - Last synced at: 7 months ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 1

ahmedopolis/Flight_Distance_Calculation_with_MapReduce Fork of Nicole-Hong/Flight_Distance_Calculation_with_MapReduce

This project was completed as the small scale team project at YCBS 257 Data at Scale class in Professional Development Certificate Program in Data Science and Machine Learning at McGill University, and the project introduced the MapReduce functions for solving the problems with Big Data.

Language: Jupyter Notebook - Size: 9.17 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

CLDXiang/Mining-Frequent-Pattern-from-Search-History

《大数据挖掘技术》@复旦课程项目，试图从搜狗实验室用户查询日志数据（2008）中找出搜索记录中有较高支持度关键词的频繁二项集。在实现层面上，我搭建了一个由五台服务器组成的微型 Hadoop 集群，并且用 Python 实现了 Parallel FP-Growth 算法中的三个 MapReduce 过程。

Language: Python - Size: 1.52 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 26 - Forks: 2

sreetamparida/Hiraishin

A REST-based service that translates the SQL query into MapReduce and Spark jobs. It runs these jobs and provides the JSON object. SQL to MapReduce and Spark translator.

Language: Python - Size: 194 KB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 3 - Forks: 0

NbnbZero/Recommendation-System

基于Item-based CF和XGBRegressor完成的用户对商品的推荐系统

Language: Python - Size: 24.2 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

huynhtloi/Mining-Of-Massive-Datasets

Introduction to Mining Of Massive Datasets

Language: Jupyter Notebook - Size: 22 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

24jmwangi/python_mapred

Using mapreduce in hadoop and python to score sentiments

Language: Python - Size: 43 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

MaimoonaKhilji/MapReduce-Programs

MapReduce Program Codes in Python Spyder

Language: Jupyter Notebook - Size: 15.5 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

MaimoonaKhilji/MapReduce-Presentation

Mapreduce Presentation

Size: 945 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

yoongoing/bigdata_pyspark

⚡️공개용 맵리듀스 플랫폼인 Spark를 사용하여 데이터마이닝을 해보자⚡️

Language: Jupyter Notebook - Size: 438 KB - Last synced at: 7 months ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 0

python-supply/map-reduce-and-multiprocessing

Multiprocessing can be an effective way to speed up a time-consuming workflow via parallelization. This article illustrates how multiprocessing can be utilized in a concise way when implementing MapReduce-like workflows.

Language: Jupyter Notebook - Size: 166 KB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 1

reeryid/BigData_Tubes

Tugas Besar Big Data (hadoop)

Language: Python - Size: 48.8 KB - Last synced at: 12 months ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

yvgupta03/Big_Data_Assignments_MapReduce_Graphframe

Short projects on UTDallas Big Data course C6350 using PySpark MapReduce and Graphframe library

Language: Jupyter Notebook - Size: 364 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 1

ochoajuanm/stack-overflow-mapreduce

Análisis de metadata extraída de Stack Overflow a través del paradigma MapReduce

Language: Python - Size: 13.9 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

abhibalani/emr_lambda

Lambda to start EMR and run a map reduce job

Language: Python - Size: 2.93 KB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 1

naman884/Big-Data

Language: Jupyter Notebook - Size: 1.15 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

vigneshSs-07/Bigdata_Technologies

This repo contains all technical knowledge and implementation of big data technologies.

Language: Jupyter Notebook - Size: 1.49 MB - Last synced at: 8 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

gorkinovich/SGDI

Sistemas de Gestión de Datos y de la Información (UCM, 2015)

Language: Java - Size: 2.74 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

nikhitmago/frequent-itemset-association

Market basket analysis of finding frequent itemsets using SON algorithm in Spark

Language: Python - Size: 7.81 KB - Last synced at: over 2 years ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 0

anshsarkar/Big-Data-Assignments-UE18CS322

A repository containing the source codes for the assignments done as a part of the Big Data course (UE18CS322) at PES University.

Language: Python - Size: 48.9 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

sanjitk7/MapReducePython

A MapReduce implementation in python in a docker simulated distributed system

Language: Python - Size: 19 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

kiababashahi/Montreals_Neighborhood_RDD

In this simple project, I am playing with the data sets of the city of Montreal counting the number of neighborhoods finding the largest ones, their different types, and so on using RDDs.

Language: Python - Size: 3.91 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

ashwinpn/WikiSea

Search Engine for Wikipedia.

Language: Python - Size: 96.7 KB - Last synced at: 5 months ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

r-i-c-h-a/MapReduce-based-Mini-HIVE Fork of sharanyavenkat25/MapReduce-based-Mini-HIVE

A Hadoop based Map-Reduce based SQL engine

Language: Python - Size: 162 KB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 1

aghezzafmohamed/MapReduce-with-PySpark

MapReduce with PySpark

Language: Jupyter Notebook - Size: 3.91 KB - Last synced at: about 1 month ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

aditeyabaral/mapreduce-word2vec

Implementation of Word2Vec for large datasets as a Map-Reduce Job using Hadoop Streaming.

Language: Python - Size: 1.45 MB - Last synced at: 9 days ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

p-disha/NYC-Parking-Violations

This is an analysis on NYC Parking Violations dataset using PySpark SparkSQL and Map Reduce to find some useful insights.

Language: Python - Size: 9.24 MB - Last synced at: 7 months ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

NilufaYeasmin/MapReduce

This repo contains implementations of Mapreduce program in a large text corpus with Apache Hadoop Environment | Nilufa Yeasmin | https://www.linkedin.com/in/nilufayeasmin/

Language: CSS - Size: 3.53 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

MarcoXM/Bigdata_Programming_Analytics

Language: Jupyter Notebook - Size: 8.1 MB - Last synced at: over 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

martandsingh/SparkBigData

Apache Spark Big data basics and Machine learning with Big Data

Language: Jupyter Notebook - Size: 105 KB - Last synced at: over 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

khanhha/map_reduce

map reduce learning

Language: Python - Size: 8.8 MB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

antoinewg/ocr-page-rank

PageRank algorithm using Hadoop Streaming

Language: Python - Size: 438 KB - Last synced at: 7 months ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

londist/Community-dectection

Language: Python - Size: 9.36 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

hovig/mapreduce

Alternative Mapreduce Simple Example

Language: Python - Size: 36.3 MB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 0

Stefan-Mitic/HadoopLearning

Language: Python - Size: 37 MB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

Shresth-Gupta/NYC_Subway_Data_Analysis_Udacity

Udacity Project for Udacity's Big Data Foundations Nanodegree. This notebook analyses NYC Subway Data, asking questions and insights on it. It also implements MapReduce.

Language: Jupyter Notebook - Size: 1.54 MB - Last synced at: 4 months ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

Related Keywords

mapreduce-python 81 mapreduce 31 hadoop 24 python 24 spark 18 hadoop-mapreduce 17 big-data 16 pyspark 14 mapreduce-java 6 python3 6 bigdata 5 mapreduce-algorithm 5 jupyter-notebook 5 hadoop-streaming 5 spark-sql 5 hive 4 hadoop-hdfs 4 machine-learning 4 distributed-systems 4 docker 4 wordcount 4 pandas 4 multiprocessing 3 java 3 hdfs 3 sql 3 distributed-computing 3 data-mining 3 data-analysis 3 apache-spark 3 apache-hadoop 3 goit-algo2-hw-06 2 grpc 2 matplotlib 2 sparksql 2 parallel-computing 2 scala 2 kmeans 2 map-reduce 2 search-engine 2 gcp 2 numpy 2 mrjob 2 cloud-computing 2 big-data-analytics 2 rdd 2 udacity-nanodegree 2 udacity 2 analysis 2 mapper 2 data-engineering 2 spark-streaming 2 spark-dataframes 2 hiveql 2 reducer 2 hbase 2 mpi 1 bigdataproject 1 classification-algorithm 1 cuda 1 rpyc 1 mapreduce-programs 1 docker-hadoop 1 spark-cluster 1 artificial-intelligence 1 apache2 1 linux 1 hadoop-jar 1 unipi 1 hadoop-filesystem 1 taxi-data 1 unipisa 1 university-of-pisa 1 nyc-taxi-dataset 1 pandas-dataframe 1 pagerank-algorithm 1 data-visualization 1 data-science 1 udacity-projects 1 piglatin 1 mapreduce-demo 1 mapreduce-designpatterns 1 parquet-files 1 query-execution-plan 1 query-optimization 1 rdds 1 servrless 1 apache-pig 1 pig 1 codemaker 1 pig-latin 1 canopy 1 clustering 1 clustering-algorithm 1 cure 1 dataminig 1 parallel-python 1 python-articles 1 python-introduction 1 python-multiprocessing 1