An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: mapreduce

benedekh/bigdata-projects

Student projects in Big Data field.

Language: Java - Size: 225 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 19 - Forks: 12

groda/big_data

Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.

Language: Jupyter Notebook - Size: 54.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 81 - Forks: 27

cdapio/cdap

An open source framework for building data analytic applications.

Language: Java - Size: 613 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 777 - Forks: 351

PowerJob/PowerJob

Enterprise job scheduling middleware with distributed computing ability.

Language: Java - Size: 18.6 MB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 7,520 - Forks: 1,314

apache/uniffle

Uniffle is a high performance, general purpose Remote Shuffle Service.

Language: Java - Size: 13.1 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 418 - Forks: 160

Hazim-HF/Data-Management

This repository covers data management and big data technologies, including databases, querying, and big data processing. Topics include Hadoop (MapReduce, HDFS), Apache Spark, data security, and optimization techniques. Students will learn Spark’s architecture, data distribution, parallel computing, and memory caching to enhance big data solutions

Language: Jupyter Notebook - Size: 73.6 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

H1ghBre4k3r/rust-map-reduce

A small hobby implementation of MapReduce that I hacked together at 2am.

Language: Rust - Size: 50.8 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

kevwan/mapreduce

A in-process MapReduce library to help you optimizing service response time or concurrent task processing.

Language: Go - Size: 44.9 KB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 174 - Forks: 24

hiejulia/Data-pipeline-project

Data pipeline project

Language: Jupyter Notebook - Size: 55.1 MB - Last synced at: about 7 hours ago - Pushed at: 5 months ago - Stars: 35 - Forks: 23

lynnlangit/learning-hadoop-and-spark

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

Language: HTML - Size: 13.6 MB - Last synced at: 12 days ago - Pushed at: 7 months ago - Stars: 196 - Forks: 165

tonyamf/Demonstration_of_Big-Data_Analysis_Pipeline

Demonstration of a Big Data Program," aims to demonstrate a complete big data analysis pipeline. The central goal is to analyze a dataset of house prices in India to understand the factors influencing the price (descriptive analytics) and to build a model that can predict house prices based on these factors (predictive analytics)

Language: Jupyter Notebook - Size: 43.2 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

MariaSchoinaki/roomie

An implementation of a distributed room booking mobile app, we created during our third year at AUEB's Distributed Systems course. This implementation leverages the MapReduce framework.

Language: Java - Size: 24.1 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 2

miozilla/dataprochs

dataprochs :elephant::honeybee: : Dataproc Cluster # Apache # Hadoop # MapReduce # Spark # YARN # HDFS

Language: Shell - Size: 2.21 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

water8394/BigData-Interview

:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Size: 6.59 MB - Last synced at: 22 days ago - Pushed at: almost 4 years ago - Stars: 1,622 - Forks: 447

EleniKechrioti/roomie Fork of MariaSchoinaki/roomie

An implementation of a distributed airbnb booking mobile app, we created during our third year at AUEB's Distributed Systems course. This implementation leverages the MapReduce framework.

Language: Java - Size: 24.1 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

casangi/graphviper

Dask Based MapReduce for Multi Xarray Datasets.

Language: Python - Size: 2.61 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 1 - Forks: 2

miguno/avro-hadoop-starter 📦

Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.

Language: Java - Size: 650 KB - Last synced at: 3 days ago - Pushed at: over 9 years ago - Stars: 115 - Forks: 83

maengsanha/bigdata

KMU CS Hot Topics in Big Data

Language: Go - Size: 54.5 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 2 - Forks: 0

microsoft/Mobius

C# and F# language binding and extensions to Apache Spark

Language: C# - Size: 6.44 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 940 - Forks: 211

zeekling/hadoop_book

Hadoop 学习笔记。

Size: 216 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

CamDavidsonPilon/tdigest

t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark

Language: Python - Size: 91.8 KB - Last synced at: 11 days ago - Pushed at: about 2 years ago - Stars: 396 - Forks: 54

MadhukarSaiBabu/Aviation-Trend-Analysis-using-MapReduce-and-R

Developed a data-driven solution leveraging Hadoop MapReduce, Hive, and R to analyze air travel data. Identified trends in passenger volume, route utilization, and peak travel periods, providing actionable insights for optimizing airline operations and improving the passenger experience.

Size: 1.34 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

amirkiarafiei/Spark-Statistics-Analysis

Descriptive and Exploratory Statistical functions implemented within a distributed Spark Cluster with Performance Analysis and Visualizations

Language: Python - Size: 1.28 MB - Last synced at: 14 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

serihiro/simple_map_reduce

Distributed MapReduce implementation written in ruby.

Language: Ruby - Size: 257 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

douban/dpark 📦

Python clone of Spark, a MapReduce alike framework in Python

Language: Python - Size: 2.65 MB - Last synced at: 24 days ago - Pushed at: over 4 years ago - Stars: 2,680 - Forks: 530

TmohamedashrafT/High-Availability-Bigdata-Cluster

A highly available, fully distributed big data cluster built with Docker, integrating Hadoop HDFS, YARN, ZooKeeper, HBase, Hive, Spark, and Tez. Designed for scalability, fault tolerance, and seamless data processing in a containerized environment.

Language: Shell - Size: 16.6 KB - Last synced at: 28 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

abshek7/Big-data

A repository for documenting the learning related to theory and practical notes of big data computing.

Language: Python - Size: 330 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

PasanAbeysekara/Taxi-Pickup-Hotspot-Analysis-using-Hadoop-MapReduce

This project analyzes one month of NYC Yellow Taxi trip data (January 2016) to identify the busiest taxi pickup locations. It utilizes the Hadoop MapReduce framework to process the data and a lookup table to map location IDs to human-readable zone names.

Language: Java - Size: 5.65 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

PramithaMJ/job-analysis-MapReduce

Technical Skills Analysis using MapReduce - hadoop

Language: Shell - Size: 883 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

mcxiaoxiao/bookdata-visual

mapreduce数据分析可视化 mapreduce期末作业 当当网数据可视化前后端

Size: 133 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

jathavaan/bds-seoul-hadoop

Language: Python - Size: 81.1 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

PramithaMJ/hadoop-cluster-manager

Complete Apache Hadoop 3.4.1 cluster installation and management toolkit with automated scripts, comprehensive documentation, and production-ready configuration templates for single-node and multi-node deployments.

Language: Shell - Size: 33.2 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

WilliamZhang20/dask-algorithms

Implemented distributed computing algorithms

Language: Python - Size: 3.91 KB - Last synced at: 16 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

DigitalPebble/behemoth 📦

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Language: Java - Size: 7.45 MB - Last synced at: 13 days ago - Pushed at: about 7 years ago - Stars: 282 - Forks: 59

CocaineCong/tangseng

Tangseng search engine including full text search and vector search base on golang. 基于go语言的搜索引擎,信息检索系统

Language: Go - Size: 6.81 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 124 - Forks: 36

MuhamedHekal/Hadoop-HA-Cluster-on-Docker

Hadoop3-HA-Docker is a production-ready, fault-tolerant Hadoop cluster deployed with Docker Compose. It automates the setup of a fully distributed Hadoop ecosystem with high availability (HA) features, designed for reliability, scalability, and real-world big data workloads

Language: Dockerfile - Size: 273 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

collabH/bigdata-growth

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

Language: Shell - Size: 221 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 1,612 - Forks: 375

heibaiying/BigData-Notes

大数据入门指南 :star:

Language: Java - Size: 22.9 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 16,422 - Forks: 4,279

cubefs/compass

Compass is a task diagnosis platform for bigdata

Language: Java - Size: 5.92 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 385 - Forks: 139

donnemartin/data-science-ipython-notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Language: Python - Size: 46.8 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 28,169 - Forks: 7,978

IwasakiYuuki/data-analysis-platform-infra

Construct on-premises Hadoop cluster using ansible

Language: Jinja - Size: 248 KB - Last synced at: 3 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

mahmoudparsian/data-algorithms-book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Language: Java - Size: 397 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 1,075 - Forks: 661

mahmoudparsian/big-data-mapreduce-course

Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University

Language: HTML - Size: 601 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 158 - Forks: 143

arindas/mit-6.824-distributed-systems

Template repository to work on the labs from MIT 6.824 Distributed Systems course.

Language: Go - Size: 1.42 MB - Last synced at: 2 days ago - Pushed at: about 3 years ago - Stars: 60 - Forks: 8

kwartile/connected-component

Map Reduce Implementation of Connected Component on Apache Spark

Language: Scala - Size: 26.4 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 85 - Forks: 18

grailbio/bigslice

A serverless cluster computing system for the Go programming language

Language: Go - Size: 2.66 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 554 - Forks: 35

srafay/Hadoop-hands-on

Learning how to tame the Big Data with Hadoop and related technologies

Language: PigLatin - Size: 96.7 KB - Last synced at: 3 days ago - Pushed at: over 5 years ago - Stars: 23 - Forks: 21

adwaiy2912/BDA-Lab

Repository contains weekly lab work and assignments for the Big Data Analytics (BDA) course

Language: Python - Size: 7.8 MB - Last synced at: 11 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

mahmoudparsian/data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Language: Python - Size: 44.9 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 215 - Forks: 93

ggcr/go-mapreduce

MapReduce implementation written in Go. MIT 6824

Language: Go - Size: 2.56 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

datawhalechina/juicy-bigdata

🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

Language: Python - Size: 27.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 313 - Forks: 43

eecs485staff/madoop

A light weight MapReduce framework for education

Language: Python - Size: 515 KB - Last synced at: 11 days ago - Pushed at: 3 months ago - Stars: 9 - Forks: 4

limbo-io/fluxion

Orchestrate & Schedule Platform. More custom extension for distributed computation.

Language: Java - Size: 1.11 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 1

lovnishverma/bigdataecosystem

Complete Big Data Ecosystem on Docker Desktop

Language: Shell - Size: 405 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 7 - Forks: 1

ReusJimenez/python-data-engineering

Laboratorios prácticos de ingeniería de datos con Python. ⚙️

Language: Jupyter Notebook - Size: 27.7 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Tencent/Firestorm

Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers

Language: Java - Size: 1.63 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 256 - Forks: 72

whitfin/efflux

Easy Hadoop Streaming and MapReduce interfaces in Rust

Language: Rust - Size: 51.8 KB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 40 - Forks: 7

grexrr/code-learning-ai

Some concept related practices

Language: Jupyter Notebook - Size: 63.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

tejaswirupa/Big-Data-Systems-Project-Hadoop-Hive-MapReduce-Sqoop-Workflows

Designed and implemented scalable data workflows using Hadoop, Hive, and Sqoop. This project involved log aggregation, airline delay analysis, word frequency processing, and TF-IDF computation across multiple datasets using MapReduce, Hive queries, and Hadoop Streaming.

Size: 3.75 MB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

asuiu/pyxtension

Pure Python extensions library that includes Scala-like streams, Json with attribute access syntax, and other common use stuff

Language: Python - Size: 334 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 46 - Forks: 1

yahiazakaria445/MapReduce-in-bash-scripting

A Bash-Based MapReduce for Distributed File Processing

Language: Shell - Size: 7.81 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

bcongdon/corral

🐎 A serverless MapReduce framework written for AWS Lambda

Language: Go - Size: 1.43 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 694 - Forks: 40

cwensel/cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.

Language: Java - Size: 32.1 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 350 - Forks: 221

longshilin/Hadoop-MapReduce

基于MapReduce的应用案例 :ear_of_rice:

Language: Java - Size: 30.3 KB - Last synced at: 3 months ago - Pushed at: over 7 years ago - Stars: 25 - Forks: 7

LucasUTNFRD/mit6.5840

Distributed System Related Projects in GO

Language: Go - Size: 14.4 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

hobbyquaker/mqttDB

JSON Store with MQTT Interface :books::open_file_folder::satellite:

Language: JavaScript - Size: 99.6 KB - Last synced at: 17 days ago - Pushed at: about 7 years ago - Stars: 26 - Forks: 0

Erfanafshar/hadoop-cluster-crime-stats

Distributed crime data analysis using a multi-node Hadoop cluster with MapReduce and HDFS.

Language: Java - Size: 197 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

mimecast/dtail

DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.

Language: Go - Size: 12.3 MB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 128 - Forks: 10

cold-bin/mit-6.824-labs

实现2023年的mit6.824的四个labs和三个challenges

Language: Go - Size: 9.53 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

TurboWay/pybigdata

使用 python 操作大数据的各种组件

Language: Python - Size: 85 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 63 - Forks: 18

course-files/DistributedDatabases-HDFS-MapReduce-WideColumn

Concepts: Distributed Database Management Systems and Non-Relational Data Models - Setting up Hadoop in a fully distributed mode, using Hadoop Distributed File System (HDFS) and MapReduce (in Java and Python), and using a non-relational database based on a wide-column data model (HBase).

Language: TeX - Size: 220 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

asuiu/streamerate

Iterable Java8 style Streams for Python

Language: Python - Size: 486 KB - Last synced at: 11 days ago - Pushed at: 4 months ago - Stars: 8 - Forks: 3

dayyass/pydfs

Distributed File System written in Python

Language: Python - Size: 61.5 KB - Last synced at: 10 days ago - Pushed at: almost 3 years ago - Stars: 14 - Forks: 0

am-kantox/elixir-iteraptor

Handy enumerable operations implementation.

Language: Elixir - Size: 206 KB - Last synced at: 21 days ago - Pushed at: 5 months ago - Stars: 72 - Forks: 9

niqdev/devops

DevOps

Language: Shell - Size: 9.18 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 48 - Forks: 19

lokk798/BigData-Quiz-Bank

A comprehensive collection of multiple-choice questions (MCQs) and assessments covering Hadoop, MapReduce, and the broader Big Data ecosystem.

Size: 5.86 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

feng-li/Distributed-Statistical-Computing

Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)

Language: HTML - Size: 49.1 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 106 - Forks: 66

mahmoudparsian/pyspark-algorithms

PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2

Language: Python - Size: 40.5 MB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 84 - Forks: 44

vitalibo/grapes

Six degrees of separation theory research

Language: Java - Size: 262 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

KhalilKrugerOS/PaymentMethodCounter

INSAT exercice solution where we count how many transactions use Mastercard using MapReduce Frameword on hadoop

Language: Java - Size: 0 Bytes - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

InnoFang/subgraph-isomorphism

❄Implement the common subgraph isomorphism algorithms (i.e. Ullmann, VF2) based on MapReduce on Hadoop

Language: Java - Size: 19.6 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 19 - Forks: 0

ruitianzhong/xdu-distributed-system

Assignment for Distributed Computing(分布式计算) and Network Application Desgin(网络程序应用设计) in Xidian University(Spring 2024)

Language: Java - Size: 2.4 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

5ss90/Market_Basket_Analysis

A data mining project analyzing Instacart's 3 million grocery orders to uncover customer shopping patterns and product associations. Using market basket analysis and the Apriori algorithm, the project reveals key insights about shopping behavior, product combinations, and temporal patterns, providing valuable recommendations for retail strategy

Language: Jupyter Notebook - Size: 203 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

chaokunyang/athena

A task scheduler for spark, flink, mapreduce, java, python, bash

Language: Java - Size: 176 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 3

touero/ctenopharyngodon-idella

Use the MapReduce's Java interface to distributed crawle the data of Chinese universities and learn basic knowledge of hdfs.

Language: Java - Size: 3.75 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 140 - Forks: 0

taovietducofficial/BDA-PROJECT

This project analyzes U.S. traffic accidents using Jupyter Lab and Power BI to identify trends, causes, and risks through data preprocessing, analysis, and visualization.

Language: Jupyter Notebook - Size: 19.5 MB - Last synced at: 11 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

gowri-malla216/Predicting-UEFA-Champions-league-match-outcome-in-Fifa

The project aims to predict football UEFA champion's league match outcome of taking two teams and in particular year using machine learning, focusing on player ratings data-set obtained from fifa application and team past performance with opponent in that year [2016-2013].

Language: Python - Size: 1.17 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Sabaudian/AMD_Market_Basket_Analysis

Algorithms for Massive Datasets (AMD) -- Market-baskets analysis project

Language: Jupyter Notebook - Size: 2.16 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

edydfang/UW-Madison-CS537

Operating System Projects

Language: C - Size: 1.19 MB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 9 - Forks: 6

Young-ook/terraform-aws-emr

Terraform Module: Amazon EMR

Language: HCL - Size: 7.33 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

AashikSharif/Topic-Sensitive-Page-Ranking-algorithm-using-MapReduce

Topic Sensitive Page Ranking algorithm using MapReduce

Language: Python - Size: 10.8 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

MarioInf-Master-CompuerScience-UCM/Gestion_datosInformacion

Repositorio de trabajo de la asignatura "Sistemas de gestión de datos y de la información" (curso 22-23), perteneciente al Máster en Ingeniería Informática de la Universidad Complutense de Madrid (UCM)

Language: Jupyter Notebook - Size: 313 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

mohammad-malik/wikipedia-naive-search

This repository houses a naïve search engine utilising MapReduce technology which leverages a 5GB csv file as dataset. It makes use of the Vector Space Model for Information Retrieval. This was developed as part of an assignment for the course Fundamentals of Big Data Analytics (DS2004).

Language: Python - Size: 992 KB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

dhchenx/Catla-HS

Catla for Hadoop and Spark (Catla-HS): An open-source system to support tuning MapReduce performance on Hadoop and Spark clusters.

Language: Java - Size: 105 MB - Last synced at: 4 months ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

berksudan/Analysis-on-Big-Data-with-Hadoop

Implementation of Statistical Methods via Hadoop Map-Reduce Library.

Language: Java - Size: 75.2 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

flipkart-incubator/hbase-orm

A production-grade HBase ORM library that makes accessing HBase clean, fast and fun (Can also be used as Bigtable ORM)

Language: Java - Size: 363 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 81 - Forks: 41

samuele-lolli/Steam-Recommendation-System

A basic recommendation system built with Scala and Spark.

Language: Scala - Size: 368 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 1 - Forks: 2

nzrsky/FunkObjC 📦

Functional and typed extensions for ObjC 🚀

Language: Objective-C - Size: 169 KB - Last synced at: 26 days ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

pngo1997/K-Means-K-Median-Clustering-with-Hadoop-MapReduce

Implements K-Means and K-Median Clustering using Hadoop MapReduce on a three-node cluster.

Language: Jupyter Notebook - Size: 3.4 MB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

EthanWng97/ray-mapreduce-kmeans

📚 Build a whole MapReduce on top of Ray and implement clustering algorithm based on that.

Language: Python - Size: 16 MB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 7 - Forks: 2