Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: mapreduce-python

aryanGupta-09/Kmeans-using-MapReduce

K-means clustering algorithm using MapReduce.

Language: Python - Size: 12.7 KB - Last synced: about 12 hours ago - Pushed: about 13 hours ago - Stars: 0 - Forks: 0

mahmoudparsian/big-data-mapreduce-course

Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University

Language: HTML - Size: 549 MB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 146 - Forks: 142

PrudhviVajja/DistributedMapReduce

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real-world tasks are expressible in this model.

Language: Python - Size: 1.08 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 2 - Forks: 0

aaqib-ahmed-nazir/BDA_Assignment02

This repository aims to develop a basic search engine utilizing Hadoop's MapReduce framework to index and process extensive text corpora efficiently. The dataset used for this project is a subset of the English Wikipedia dump, totaling 5.2 GB in size. The project focuses on implementing a naive search algorithm to address challenges in information.

Language: Jupyter Notebook - Size: 120 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 1 - Forks: 0

Abdurrehman7452/search-engine-utilising-hadoop-MapReduce-technology-with-python-on-wikipedia-articles

Developing a Naive Search Engine Utilising Apache Hadoop MapReduce Technology on a dataset in comma-separated values (CSV) format containing around 5 million Wikipedia articles provided by Wikimedia, as part of an assignment for the Fundamental of Big Data Analytics (DS2004) course.

Size: 1.95 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0

fbaldi6/PageRank-Spark Fork of edofazza/PageRank-Spark

Implementation of the MapReduce PageRank algorithm using the Spark framework both in Python and in Java (developed for Cloud Computing course)

Size: 4.99 MB - Last synced: 2 months ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 0

AdamJeddy/BigData-Bits-Workshop

BigData Workshop - Python MapReduce for word frequency analysis on varied datasets.

Language: Jupyter Notebook - Size: 9.68 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

HarshitDawar55/MapReduce

Programs for MapReduce written in java with least complexity!

Language: Java - Size: 76.2 KB - Last synced: about 1 month ago - Pushed: 4 months ago - Stars: 1 - Forks: 0

yuliya-akchurina/Big-Data-Programming

Big Data Programming Projects

Language: Python - Size: 57.5 MB - Last synced: 4 months ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0

AmitabhCh822/BigData-MapReduce-MovieRatings-Analysis

Big Data analysis project using MapReduce in Python to process movie ratings. Includes scripts for aggregating ratings and identifying the most rated movies, demonstrating data analysis on a large scale.

Language: Python - Size: 9.77 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0

anshul1004/MutualFriends

Implementation of Hadoop and Spark

Language: Java - Size: 23 MB - Last synced: 6 months ago - Pushed: about 4 years ago - Stars: 1 - Forks: 0

Roon311/WDC-PageRank-Hadoop-MapReduce

Performing Map reduce to get the page rank on the WDC data.

Language: Python - Size: 1000 KB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 0 - Forks: 0

manishghop/CS651-UW-Project

CS651 Final Project

Language: Jupyter Notebook - Size: 1.33 MB - Last synced: 6 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

ashwinpn/WikiSea

Search Engine for Wikipedia.

Language: Python - Size: 96.7 KB - Last synced: 8 months ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

shivamgupta7/Hadoop

Hadoop Applications. In repo have Big Data tools like Spark(pyspark), HIVE(pyhive), Elastic Search, Oozie. I can use all these tools using python libraries after setup all the configration.

Language: Jupyter Notebook - Size: 4.82 MB - Last synced: 8 months ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 1

a22057916w/BDM

Big Data Mining and Applications

Language: Python - Size: 32.2 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

OskarMierkiewicz/Hadoop-and-MapReduce-with-Python

Hadoop MapReduce with Python

Language: Python - Size: 2.93 KB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0

Luyayeh/MatrixMultiplicationMR_LY

MapReduce to perform matrix multiplication.

Language: Python - Size: 257 KB - Last synced: 9 months ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0

BenitaDiop/FullStackBigData-with-SPARK

Pulled 10GB ofYelp Business data through the terminal via Kaggle API. The data was then pushed to and AWS S3 Bucket bucket for storage and analyzed on a Elastic MapReduce Cluster on a Jupyter Notebook using PySpark

Language: Jupyter Notebook - Size: 848 KB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 2 - Forks: 0

ZhiyuZhang803/DSCI553_Data_Mining_With_Spark

This repo contains the implementation of popular data mining algorithms with Python and Spark. It contains the homework assignments of DSCI553 2022 Fall. Final Grade: 103.5% (including bonus)

Language: Python - Size: 9.02 MB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

HarigovindV10/NYC-Subway-Data-Analysis

An analysis of NYC Subway Data using Hadoop Map Reduce

Language: Jupyter Notebook - Size: 529 KB - Last synced: 10 months ago - Pushed: over 5 years ago - Stars: 0 - Forks: 1

MagdaleneHo/MapReduce

A simple project on the use of map and reduce in Hadoop.

Language: Python - Size: 6.84 KB - Last synced: 10 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 1

manursanchez/desarrollosMRJob

Desarrollos en Python de patrones MapReduce, que no han sido incluidos en el TFG final.

Language: Jupyter Notebook - Size: 28.2 MB - Last synced: 10 months ago - Pushed: about 3 years ago - Stars: 1 - Forks: 1

mdarm/map-reduce-project

Project on MapReduce for the Μ111 - Big Data Management course, NKUA, Spring 2023.

Language: TeX - Size: 3.7 MB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

Raveesh1505/BigData-Training

Big data training material

Language: Python - Size: 45.9 KB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

SinghHarshita/Clustering-Algorithms-Spark

KMeans, Cure and Canpoy algorithms are demonstrated using Pyspark.

Language: Jupyter Notebook - Size: 150 KB - Last synced: 4 months ago - Pushed: about 3 years ago - Stars: 5 - Forks: 0

ahmadsalimi/dist_mr

A distributed map-reduce implemented by Python 3 and gRPC

Language: Python - Size: 1.2 MB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 0 - Forks: 1

arminZolfaghari/docker-hadoop Fork of big-data-europe/docker-hadoop

Apache Hadoop docker image | Running Python MapReduce

Language: Shell - Size: 94.7 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

Longannn/MapReduce

YouTube data analysis with comparison between big data tools (Apache Hadoop) and conventional python.

Language: Jupyter Notebook - Size: 2.41 MB - Last synced: 10 months ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0

PradeepSingh1988/mapreduce

A framework to run map reduce program. Implemented based on map reduce paper

Language: Python - Size: 5.09 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

Sahith-8055/20186008_CC

Distributed Computing using Hadoop, Docker and Python (Map Reduce)

Language: Python - Size: 51.8 KB - Last synced: about 1 year ago - Pushed: about 5 years ago - Stars: 1 - Forks: 0

krishnadey30/NewsHeadlines

This repository have codes that extracts meaningful information from News headline data-set.

Language: Python - Size: 85.9 KB - Last synced: about 1 year ago - Pushed: about 5 years ago - Stars: 2 - Forks: 1

kkoless/MapReduce

Hadoop MapReduce Python

Language: Python - Size: 1.05 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0

Edyarich/parallel-computations

"Parallel computation" course homework

Language: Cuda - Size: 358 KB - Last synced: about 1 year ago - Pushed: about 2 years ago - Stars: 0 - Forks: 1

RiccardoSagramoni/map-reduce-bloom-filter 📦

University Project for "Cloud Computing" course (MSc Computer Engineering @ University of Pisa). MapReduce applications implemented in Hadoop and Spark.

Language: Java - Size: 8.86 MB - Last synced: about 1 month ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0

scullen99/Map_Shuffle_Reduce

Language: Python - Size: 13.7 KB - Last synced: almost 1 year ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

Bayunova28/Spotify_Lyrics

This repository contains my personal project to generate mapreduce using apache hadoop

Language: Shell - Size: 19.7 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

ahmedopolis/Flight_Distance_Calculation_with_MapReduce Fork of Nicole-Hong/Flight_Distance_Calculation_with_MapReduce

This project was completed as the small scale team project at YCBS 257 Data at Scale class in Professional Development Certificate Program in Data Science and Machine Learning at McGill University, and the project introduced the MapReduce functions for solving the problems with Big Data.

Language: Jupyter Notebook - Size: 9.17 MB - Last synced: 11 months ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

Raphael-Jin/EDFS

Emulation-based System for Distributed File storage and Parallel Computation

Language: Python - Size: 5.79 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

CLDXiang/Mining-Frequent-Pattern-from-Search-History

《大数据挖掘技术》@复旦 课程项目,试图从搜狗实验室用户查询日志数据(2008)中找出搜索记录中有较高支持度关键词的频繁二项集。在实现层面上,我搭建了一个由五台服务器组成的微型 Hadoop 集群,并且用 Python 实现了 Parallel FP-Growth 算法中的三个 MapReduce 过程。

Language: Python - Size: 1.52 MB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 26 - Forks: 2

sreetamparida/Hiraishin

A REST-based service that translates the SQL query into MapReduce and Spark jobs. It runs these jobs and provides the JSON object. SQL to MapReduce and Spark translator.

Language: Python - Size: 194 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 3 - Forks: 0

NbnbZero/Recommendation-System

基于Item-based CF和XGBRegressor完成的用户对商品的推荐系统

Language: Python - Size: 24.2 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 2 - Forks: 0

huynhtloi/Mining-Of-Massive-Datasets

Introduction to Mining Of Massive Datasets

Language: Jupyter Notebook - Size: 22 MB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 0 - Forks: 0

James-Wachuka/python_mapred

Using mapreduce in hadoop and python to score sentiments

Language: Python - Size: 43 KB - Last synced: about 1 year ago - Pushed: about 4 years ago - Stars: 0 - Forks: 0

MaimoonaKhilji/MapReduce-Programs

MapReduce Program Codes in Python Spyder

Language: Jupyter Notebook - Size: 15.5 MB - Last synced: 11 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

MaimoonaKhilji/MapReduce-Presentation

Mapreduce Presentation

Size: 945 KB - Last synced: 11 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

yoongoing/bigdata_pyspark

⚡️공개용 맵리듀스 플랫폼인 Spark를 사용하여 데이터마이닝을 해보자⚡️

Language: Jupyter Notebook - Size: 438 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 2 - Forks: 0

python-supply/map-reduce-and-multiprocessing

Multiprocessing can be an effective way to speed up a time-consuming workflow via parallelization. This article illustrates how multiprocessing can be utilized in a concise way when implementing MapReduce-like workflows.

Language: Jupyter Notebook - Size: 166 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 1 - Forks: 1

yvgupta03/Big_Data_Assignments_MapReduce_Graphframe

Short projects on UTDallas Big Data course C6350 using PySpark MapReduce and Graphframe library

Language: Jupyter Notebook - Size: 364 KB - Last synced: 11 months ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 1

ochoajuanm/stack-overflow-mapreduce

Análisis de metadata extraída de Stack Overflow a través del paradigma MapReduce

Language: Python - Size: 13.9 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

abhibalani/emr_lambda

Lambda to start EMR and run a map reduce job

Language: Python - Size: 2.93 KB - Last synced: over 1 year ago - Pushed: almost 5 years ago - Stars: 3 - Forks: 1

naman884/Big-Data

Language: Jupyter Notebook - Size: 1.15 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

vigneshSs-07/Bigdata_Technologies

This repo contains all technical knowledge and implementation of big data technologies.

Language: Jupyter Notebook - Size: 1.49 MB - Last synced: 12 months ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0

gorkinovich/SGDI

Sistemas de Gestión de Datos y de la Información (UCM, 2015)

Language: Java - Size: 2.74 MB - Last synced: over 1 year ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0

nikhitmago/frequent-itemset-association

Market basket analysis of finding frequent itemsets using SON algorithm in Spark

Language: Python - Size: 7.81 KB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 2 - Forks: 0

anshsarkar/Big-Data-Assignments-UE18CS322

A repository containing the source codes for the assignments done as a part of the Big Data course (UE18CS322) at PES University.

Language: Python - Size: 48.9 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

sanjitk7/MapReducePython

A MapReduce implementation in python in a docker simulated distributed system

Language: Python - Size: 19 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

kiababashahi/Montreals_Neighborhood_RDD

In this simple project, I am playing with the data sets of the city of Montreal counting the number of neighborhoods finding the largest ones, their different types, and so on using RDDs.

Language: Python - Size: 3.91 KB - Last synced: 12 months ago - Pushed: about 3 years ago - Stars: 0 - Forks: 0

r-i-c-h-a/MapReduce-based-Mini-HIVE Fork of sharanyavenkat25/MapReduce-based-Mini-HIVE

A Hadoop based Map-Reduce based SQL engine

Language: Python - Size: 162 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 1 - Forks: 1

aditeyabaral/mapreduce-word2vec

Implementation of Word2Vec for large datasets as a Map-Reduce Job using Hadoop Streaming.

Language: Python - Size: 1.45 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

p-disha/NYC-Parking-Violations

This is an analysis on NYC Parking Violations dataset using PySpark SparkSQL and Map Reduce to find some useful insights.

Language: Python - Size: 9.24 MB - Last synced: about 1 year ago - Pushed: about 4 years ago - Stars: 0 - Forks: 0

NilufaYeasmin/MapReduce

This repo contains implementations of Mapreduce program in a large text corpus with Apache Hadoop Environment | Nilufa Yeasmin | https://www.linkedin.com/in/nilufayeasmin/

Language: CSS - Size: 3.53 MB - Last synced: about 1 year ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0

MarcoXM/Bigdata_Programming_Analytics

Language: Jupyter Notebook - Size: 8.1 MB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0

martandsingh/SparkBigData

Apache Spark Big data basics and Machine learning with Big Data

Language: Jupyter Notebook - Size: 105 KB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0

khanhha/map_reduce

map reduce learning

Language: Python - Size: 8.8 MB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0

antoinewg/ocr-page-rank

PageRank algorithm using Hadoop Streaming

Language: Python - Size: 438 KB - Last synced: about 2 months ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0

londist/Community-dectection

Language: Python - Size: 9.36 MB - Last synced: about 1 month ago - Pushed: about 5 years ago - Stars: 0 - Forks: 0

hovig/mapreduce

Alternative Mapreduce Simple Example

Language: Python - Size: 36.3 MB - Last synced: over 1 year ago - Pushed: over 5 years ago - Stars: 1 - Forks: 0

Stefan-Mitic/HadoopLearning

Language: Python - Size: 37 MB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0