Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: mapreduce-python
aryanGupta-09/Kmeans-using-MapReduce
K-means clustering algorithm using MapReduce.
Language: Python - Size: 12.7 KB - Last synced: about 12 hours ago - Pushed: about 13 hours ago - Stars: 0 - Forks: 0
mahmoudparsian/big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Language: HTML - Size: 549 MB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 146 - Forks: 142
PrudhviVajja/DistributedMapReduce
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real-world tasks are expressible in this model.
Language: Python - Size: 1.08 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 2 - Forks: 0
aaqib-ahmed-nazir/BDA_Assignment02
This repository aims to develop a basic search engine utilizing Hadoop's MapReduce framework to index and process extensive text corpora efficiently. The dataset used for this project is a subset of the English Wikipedia dump, totaling 5.2 GB in size. The project focuses on implementing a naive search algorithm to address challenges in information.
Language: Jupyter Notebook - Size: 120 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 1 - Forks: 0
Abdurrehman7452/search-engine-utilising-hadoop-MapReduce-technology-with-python-on-wikipedia-articles
Developing a Naive Search Engine Utilising Apache Hadoop MapReduce Technology on a dataset in comma-separated values (CSV) format containing around 5 million Wikipedia articles provided by Wikimedia, as part of an assignment for the Fundamental of Big Data Analytics (DS2004) course.
Size: 1.95 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0
fbaldi6/PageRank-Spark Fork of edofazza/PageRank-Spark
Implementation of the MapReduce PageRank algorithm using the Spark framework both in Python and in Java (developed for Cloud Computing course)
Size: 4.99 MB - Last synced: 2 months ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 0
AdamJeddy/BigData-Bits-Workshop
BigData Workshop - Python MapReduce for word frequency analysis on varied datasets.
Language: Jupyter Notebook - Size: 9.68 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
HarshitDawar55/MapReduce
Programs for MapReduce written in java with least complexity!
Language: Java - Size: 76.2 KB - Last synced: about 1 month ago - Pushed: 4 months ago - Stars: 1 - Forks: 0
yuliya-akchurina/Big-Data-Programming
Big Data Programming Projects
Language: Python - Size: 57.5 MB - Last synced: 4 months ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0
AmitabhCh822/BigData-MapReduce-MovieRatings-Analysis
Big Data analysis project using MapReduce in Python to process movie ratings. Includes scripts for aggregating ratings and identifying the most rated movies, demonstrating data analysis on a large scale.
Language: Python - Size: 9.77 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0
anshul1004/MutualFriends
Implementation of Hadoop and Spark
Language: Java - Size: 23 MB - Last synced: 6 months ago - Pushed: about 4 years ago - Stars: 1 - Forks: 0
Roon311/WDC-PageRank-Hadoop-MapReduce
Performing Map reduce to get the page rank on the WDC data.
Language: Python - Size: 1000 KB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 0 - Forks: 0
manishghop/CS651-UW-Project
CS651 Final Project
Language: Jupyter Notebook - Size: 1.33 MB - Last synced: 6 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
ashwinpn/WikiSea
Search Engine for Wikipedia.
Language: Python - Size: 96.7 KB - Last synced: 8 months ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0
shivamgupta7/Hadoop
Hadoop Applications. In repo have Big Data tools like Spark(pyspark), HIVE(pyhive), Elastic Search, Oozie. I can use all these tools using python libraries after setup all the configration.
Language: Jupyter Notebook - Size: 4.82 MB - Last synced: 8 months ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 1
a22057916w/BDM
Big Data Mining and Applications
Language: Python - Size: 32.2 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0
OskarMierkiewicz/Hadoop-and-MapReduce-with-Python
Hadoop MapReduce with Python
Language: Python - Size: 2.93 KB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0
Luyayeh/MatrixMultiplicationMR_LY
MapReduce to perform matrix multiplication.
Language: Python - Size: 257 KB - Last synced: 9 months ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0
BenitaDiop/FullStackBigData-with-SPARK
Pulled 10GB ofYelp Business data through the terminal via Kaggle API. The data was then pushed to and AWS S3 Bucket bucket for storage and analyzed on a Elastic MapReduce Cluster on a Jupyter Notebook using PySpark
Language: Jupyter Notebook - Size: 848 KB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 2 - Forks: 0
ZhiyuZhang803/DSCI553_Data_Mining_With_Spark
This repo contains the implementation of popular data mining algorithms with Python and Spark. It contains the homework assignments of DSCI553 2022 Fall. Final Grade: 103.5% (including bonus)
Language: Python - Size: 9.02 MB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
HarigovindV10/NYC-Subway-Data-Analysis
An analysis of NYC Subway Data using Hadoop Map Reduce
Language: Jupyter Notebook - Size: 529 KB - Last synced: 10 months ago - Pushed: over 5 years ago - Stars: 0 - Forks: 1
MagdaleneHo/MapReduce
A simple project on the use of map and reduce in Hadoop.
Language: Python - Size: 6.84 KB - Last synced: 10 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 1
manursanchez/desarrollosMRJob
Desarrollos en Python de patrones MapReduce, que no han sido incluidos en el TFG final.
Language: Jupyter Notebook - Size: 28.2 MB - Last synced: 10 months ago - Pushed: about 3 years ago - Stars: 1 - Forks: 1
mdarm/map-reduce-project
Project on MapReduce for the Μ111 - Big Data Management course, NKUA, Spring 2023.
Language: TeX - Size: 3.7 MB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0
Raveesh1505/BigData-Training
Big data training material
Language: Python - Size: 45.9 KB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0
SinghHarshita/Clustering-Algorithms-Spark
KMeans, Cure and Canpoy algorithms are demonstrated using Pyspark.
Language: Jupyter Notebook - Size: 150 KB - Last synced: 4 months ago - Pushed: about 3 years ago - Stars: 5 - Forks: 0
ahmadsalimi/dist_mr
A distributed map-reduce implemented by Python 3 and gRPC
Language: Python - Size: 1.2 MB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 0 - Forks: 1
arminZolfaghari/docker-hadoop Fork of big-data-europe/docker-hadoop
Apache Hadoop docker image | Running Python MapReduce
Language: Shell - Size: 94.7 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0
Longannn/MapReduce
YouTube data analysis with comparison between big data tools (Apache Hadoop) and conventional python.
Language: Jupyter Notebook - Size: 2.41 MB - Last synced: 10 months ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0
PradeepSingh1988/mapreduce
A framework to run map reduce program. Implemented based on map reduce paper
Language: Python - Size: 5.09 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0
Sahith-8055/20186008_CC
Distributed Computing using Hadoop, Docker and Python (Map Reduce)
Language: Python - Size: 51.8 KB - Last synced: about 1 year ago - Pushed: about 5 years ago - Stars: 1 - Forks: 0
krishnadey30/NewsHeadlines
This repository have codes that extracts meaningful information from News headline data-set.
Language: Python - Size: 85.9 KB - Last synced: about 1 year ago - Pushed: about 5 years ago - Stars: 2 - Forks: 1
kkoless/MapReduce
Hadoop MapReduce Python
Language: Python - Size: 1.05 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0
Edyarich/parallel-computations
"Parallel computation" course homework
Language: Cuda - Size: 358 KB - Last synced: about 1 year ago - Pushed: about 2 years ago - Stars: 0 - Forks: 1
RiccardoSagramoni/map-reduce-bloom-filter 📦
University Project for "Cloud Computing" course (MSc Computer Engineering @ University of Pisa). MapReduce applications implemented in Hadoop and Spark.
Language: Java - Size: 8.86 MB - Last synced: about 1 month ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0
scullen99/Map_Shuffle_Reduce
Language: Python - Size: 13.7 KB - Last synced: almost 1 year ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
Bayunova28/Spotify_Lyrics
This repository contains my personal project to generate mapreduce using apache hadoop
Language: Shell - Size: 19.7 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
ahmedopolis/Flight_Distance_Calculation_with_MapReduce Fork of Nicole-Hong/Flight_Distance_Calculation_with_MapReduce
This project was completed as the small scale team project at YCBS 257 Data at Scale class in Professional Development Certificate Program in Data Science and Machine Learning at McGill University, and the project introduced the MapReduce functions for solving the problems with Big Data.
Language: Jupyter Notebook - Size: 9.17 MB - Last synced: 11 months ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0
Raphael-Jin/EDFS
Emulation-based System for Distributed File storage and Parallel Computation
Language: Python - Size: 5.79 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
CLDXiang/Mining-Frequent-Pattern-from-Search-History
《大数据挖掘技术》@复旦 课程项目,试图从搜狗实验室用户查询日志数据(2008)中找出搜索记录中有较高支持度关键词的频繁二项集。在实现层面上,我搭建了一个由五台服务器组成的微型 Hadoop 集群,并且用 Python 实现了 Parallel FP-Growth 算法中的三个 MapReduce 过程。
Language: Python - Size: 1.52 MB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 26 - Forks: 2
sreetamparida/Hiraishin
A REST-based service that translates the SQL query into MapReduce and Spark jobs. It runs these jobs and provides the JSON object. SQL to MapReduce and Spark translator.
Language: Python - Size: 194 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 3 - Forks: 0
NbnbZero/Recommendation-System
基于Item-based CF和XGBRegressor完成的用户对商品的推荐系统
Language: Python - Size: 24.2 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 2 - Forks: 0
huynhtloi/Mining-Of-Massive-Datasets
Introduction to Mining Of Massive Datasets
Language: Jupyter Notebook - Size: 22 MB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 0 - Forks: 0
James-Wachuka/python_mapred
Using mapreduce in hadoop and python to score sentiments
Language: Python - Size: 43 KB - Last synced: about 1 year ago - Pushed: about 4 years ago - Stars: 0 - Forks: 0
MaimoonaKhilji/MapReduce-Programs
MapReduce Program Codes in Python Spyder
Language: Jupyter Notebook - Size: 15.5 MB - Last synced: 11 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
MaimoonaKhilji/MapReduce-Presentation
Mapreduce Presentation
Size: 945 KB - Last synced: 11 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
yoongoing/bigdata_pyspark
⚡️공개용 맵리듀스 플랫폼인 Spark를 사용하여 데이터마이닝을 해보자⚡️
Language: Jupyter Notebook - Size: 438 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 2 - Forks: 0
python-supply/map-reduce-and-multiprocessing
Multiprocessing can be an effective way to speed up a time-consuming workflow via parallelization. This article illustrates how multiprocessing can be utilized in a concise way when implementing MapReduce-like workflows.
Language: Jupyter Notebook - Size: 166 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 1 - Forks: 1
yvgupta03/Big_Data_Assignments_MapReduce_Graphframe
Short projects on UTDallas Big Data course C6350 using PySpark MapReduce and Graphframe library
Language: Jupyter Notebook - Size: 364 KB - Last synced: 11 months ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 1
ochoajuanm/stack-overflow-mapreduce
Análisis de metadata extraída de Stack Overflow a través del paradigma MapReduce
Language: Python - Size: 13.9 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
abhibalani/emr_lambda
Lambda to start EMR and run a map reduce job
Language: Python - Size: 2.93 KB - Last synced: over 1 year ago - Pushed: almost 5 years ago - Stars: 3 - Forks: 1
naman884/Big-Data
Language: Jupyter Notebook - Size: 1.15 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0
vigneshSs-07/Bigdata_Technologies
This repo contains all technical knowledge and implementation of big data technologies.
Language: Jupyter Notebook - Size: 1.49 MB - Last synced: 12 months ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0
gorkinovich/SGDI
Sistemas de Gestión de Datos y de la Información (UCM, 2015)
Language: Java - Size: 2.74 MB - Last synced: over 1 year ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0
nikhitmago/frequent-itemset-association
Market basket analysis of finding frequent itemsets using SON algorithm in Spark
Language: Python - Size: 7.81 KB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 2 - Forks: 0
anshsarkar/Big-Data-Assignments-UE18CS322
A repository containing the source codes for the assignments done as a part of the Big Data course (UE18CS322) at PES University.
Language: Python - Size: 48.9 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0
sanjitk7/MapReducePython
A MapReduce implementation in python in a docker simulated distributed system
Language: Python - Size: 19 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0
kiababashahi/Montreals_Neighborhood_RDD
In this simple project, I am playing with the data sets of the city of Montreal counting the number of neighborhoods finding the largest ones, their different types, and so on using RDDs.
Language: Python - Size: 3.91 KB - Last synced: 12 months ago - Pushed: about 3 years ago - Stars: 0 - Forks: 0
r-i-c-h-a/MapReduce-based-Mini-HIVE Fork of sharanyavenkat25/MapReduce-based-Mini-HIVE
A Hadoop based Map-Reduce based SQL engine
Language: Python - Size: 162 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 1 - Forks: 1
aditeyabaral/mapreduce-word2vec
Implementation of Word2Vec for large datasets as a Map-Reduce Job using Hadoop Streaming.
Language: Python - Size: 1.45 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0
p-disha/NYC-Parking-Violations
This is an analysis on NYC Parking Violations dataset using PySpark SparkSQL and Map Reduce to find some useful insights.
Language: Python - Size: 9.24 MB - Last synced: about 1 year ago - Pushed: about 4 years ago - Stars: 0 - Forks: 0
NilufaYeasmin/MapReduce
This repo contains implementations of Mapreduce program in a large text corpus with Apache Hadoop Environment | Nilufa Yeasmin | https://www.linkedin.com/in/nilufayeasmin/
Language: CSS - Size: 3.53 MB - Last synced: about 1 year ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0
MarcoXM/Bigdata_Programming_Analytics
Language: Jupyter Notebook - Size: 8.1 MB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0
martandsingh/SparkBigData
Apache Spark Big data basics and Machine learning with Big Data
Language: Jupyter Notebook - Size: 105 KB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0
khanhha/map_reduce
map reduce learning
Language: Python - Size: 8.8 MB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0
antoinewg/ocr-page-rank
PageRank algorithm using Hadoop Streaming
Language: Python - Size: 438 KB - Last synced: about 2 months ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0
londist/Community-dectection
Language: Python - Size: 9.36 MB - Last synced: about 1 month ago - Pushed: about 5 years ago - Stars: 0 - Forks: 0
hovig/mapreduce
Alternative Mapreduce Simple Example
Language: Python - Size: 36.3 MB - Last synced: over 1 year ago - Pushed: over 5 years ago - Stars: 1 - Forks: 0
Stefan-Mitic/HadoopLearning
Language: Python - Size: 37 MB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0